A Python script that automatically downloads organization logos from Wikipedia infoboxes and Wikimedia Commons. The script intelligently searches for logos, filters them by file type and content, and saves them to a local directory.
- Dual Source Strategy:
- Primary: Extracts logos directly from Wikipedia article infoboxes
- Fallback: Searches Wikimedia Commons for matching logo files
- Smart Filtering: Only downloads
.svgor.pngfiles that contain the organization name and the word "logo" - Automatic Organization: Saves logos with descriptive filenames in an organized output directory
- Date Information: Displays and uses file timestamps to sort results (newest first)
- Batch Processing: Processes multiple organizations from a simple text file
- Python 3.7 or higher
- Internet connection
- Required Python packages (see Installation)
-
Clone or download this repository
-
Install required dependencies:
pip install -r requirements.txt
This will install:
requests- For HTTP requests to Wikipedia/Wikimedia APIsbeautifulsoup4- For parsing HTML from Wikipedia pages
-
Prepare your organization list: Edit
orglist.txtand add one organization name per line:Google Samsung NASA Adidas -
Run the script:
python3 app.py
-
Find your logos: Downloaded logos will be saved in the
output/directory
The orglist.txt file should contain organization names, one per line:
- Empty lines are ignored
- Lines starting with
#are treated as comments - Organization names are case-insensitive for matching
- Searches for the organization's Wikipedia article
- Parses the HTML to find the infobox table
- Extracts the first image from the infobox
<tbody> - Validates that the image filename contains "logo"
- If valid, downloads the logo
If the infobox method doesn't return a logo (or the result doesn't contain "logo" in the filename), the script:
- Searches Wikimedia Commons using multiple search terms:
"[Org] logo""Logo of [Org]""[Org] emblem""[Org] wordmark"
- Filters results to only include:
- Files with
.svgor.pngextensions - Filenames containing the organization name
- Filenames containing the word "logo"
- Files with
- Limits results to the 5 most recent files (sorted by upload date)
- Downloads all matching files
- Results from both methods are combined
- Duplicate files are automatically removed
- All results are sorted by date (newest first)
WikimediaLogos/
├── app.py
├── orglist.txt
├── requirements.txt
└── output/
├── Google_Google_2015_logo.svg
├── Samsung_Samsung_Knox_logo.svg
├── NASA_NASA_Worm_logo.svg
└── Adidas_Adidas_2022_logo.svg
Downloaded files are named using the pattern:
{Organization}_{Original_Filename}.{ext}
If multiple files are found, they are numbered:
{Organization}_{index}_{Original_Filename}.{ext}
The script provides detailed output showing:
- Source of each logo (Wikipedia infobox or Wikimedia Commons)
- Number of files found
- File titles and dates
- Download status for each file
Example output:
Google: Found 1 matching file(s) from Wikipedia infobox
[1] File:Google 2015 logo.svg (2016-02-13) → Google_Google_2015_logo.svg
Samsung: Found 5 matching file(s) from Wikipedia infobox + Wikimedia Commons
[1] File:Samsung Knox logo.svg (2022-12-05) → Samsung_Samsung_Knox_logo.svg
[2] File:Samsung old logo before year 2015.svg (2022-11-28) → Samsung_1_Samsung_old_logo_before_year_2015.svg
...
You can modify the paths in app.py:
ORG_LIST_PATH = Path(__file__).with_name("orglist.txt") # Organization list file
OUTPUT_DIR = Path(__file__).with_name("output") # Output directoryThe script includes a User-Agent header as required by Wikimedia APIs. You can customize it in app.py:
headers = {
"User-Agent": "WikimediaLogos/1.0 (https://example.com/contact)"
}The script includes robust error handling for:
- Network connection issues
- Missing Wikipedia articles
- Invalid API responses
- File download failures
- Missing or invalid organization names
Errors are logged to the console but don't stop the batch processing of other organizations.
- Requires internet connection to access Wikipedia and Wikimedia Commons
- Some organizations may not have Wikipedia articles or infobox logos
- Logo availability depends on what's uploaded to Wikimedia Commons
- File filtering is based on filename patterns, which may miss some valid logos
This project is provided as-is for educational and personal use.
Feel free to submit issues or pull requests for improvements!