Skip to content

A robust Python crawler to extract, normalize, and export the list of U.S. embassies and consulates worldwide from the U.S. Department of State website. The project provides detailed embassy/consulate information in CSV, JSON, and YAML formats, with features for caching, progress tracking, and continent detection.

License

Notifications You must be signed in to change notification settings

BaseMax/us-embassies-consulates

Repository files navigation

US Embassies & Consulates Crawler

MIT License

A robust Python crawler to extract, normalize, and export the list of U.S. embassies and consulates worldwide from the U.S. Department of State website. The project provides detailed embassy/consulate information in CSV, JSON, and YAML formats, with features for caching, progress tracking, and continent detection.

Features

  • Crawls the official U.S. State Department embassy/consulate list
  • Extracts detailed information: country, city, code, continent, full name, address, telephone, fax, email, website, cancel/reschedule info, Google Maps link
  • Robust HTML parsing with caching for efficiency
  • Auto-detects continent (supports English country/city names)
  • Exports data to CSV, JSON, and YAML
  • Progress bar and logging for user feedback
  • Deduplication and navigation link filtering
  • Modular, maintainable codebase using Python best practices

Usage

  1. Clone the repository:

    git clone https://github.com/BaseMax/us-embassies-consulates.git
    cd us-embassies-consulates
  2. Install dependencies:

    pip install .

    (Or, for development: pip install -e .)

    This project uses PEP 621 and pyproject.toml for dependency management. No requirements.txt is needed.

  3. Run the crawler:

    python app.py
  4. Output files:

    • us_embassies_consulates.csv
    • us_embassies_consulates.json
    • us_embassies_consulates.yml

Project Structure

  • app.py — Main crawler and exporter script
  • .cache/ — Cached HTML pages for efficiency
  • us_embassies_consulates.csv — Exported embassy/consulate data (CSV)
  • us_embassies_consulates.json — Exported data (JSON)
  • us_embassies_consulates.yml — Exported data (YAML)

Customization

  • Continent Mapping:
    • The script auto-detects continent from country/city (supports English names)
  • Caching:
    • HTML pages are cached in .cache/ to minimize repeated requests
  • Logging & Progress:
    • Uses Python logging and tqdm for progress bars

License

MIT License

© 2025 Seyyed Ali Mohammadiyeh (MAX BASE)

See LICENSE for details.

About

A robust Python crawler to extract, normalize, and export the list of U.S. embassies and consulates worldwide from the U.S. Department of State website. The project provides detailed embassy/consulate information in CSV, JSON, and YAML formats, with features for caching, progress tracking, and continent detection.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages