US Embassies & Consulates Crawler

A robust Python crawler to extract, normalize, and export the list of U.S. embassies and consulates worldwide from the U.S. Department of State website. The project provides detailed embassy/consulate information in CSV, JSON, and YAML formats, with features for caching, progress tracking, and continent detection.

Features

Crawls the official U.S. State Department embassy/consulate list
Extracts detailed information: country, city, code, continent, full name, address, telephone, fax, email, website, cancel/reschedule info, Google Maps link
Robust HTML parsing with caching for efficiency
Auto-detects continent (supports English country/city names)
Exports data to CSV, JSON, and YAML
Progress bar and logging for user feedback
Deduplication and navigation link filtering
Modular, maintainable codebase using Python best practices

Usage

Clone the repository:

git clone https://github.com/BaseMax/us-embassies-consulates.git
cd us-embassies-consulates

Install dependencies:
```
pip install .
```
(Or, for development: pip install -e .)

This project uses PEP 621 and pyproject.toml for dependency management. No requirements.txt is needed.
Run the crawler:
```
python app.py
```
Output files:
- us_embassies_consulates.csv
- us_embassies_consulates.json
- us_embassies_consulates.yml

Project Structure

app.py — Main crawler and exporter script
.cache/ — Cached HTML pages for efficiency
us_embassies_consulates.csv — Exported embassy/consulate data (CSV)
us_embassies_consulates.json — Exported data (JSON)
us_embassies_consulates.yml — Exported data (YAML)

Customization

Continent Mapping:
- The script auto-detects continent from country/city (supports English names)
Caching:
- HTML pages are cached in .cache/ to minimize repeated requests
Logging & Progress:
- Uses Python logging and tqdm for progress bars

License

MIT License

See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

US Embassies & Consulates Crawler

Features

Usage

Project Structure

Customization

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
us_embassies_consulates.csv		us_embassies_consulates.csv
us_embassies_consulates.json		us_embassies_consulates.json
us_embassies_consulates.yml		us_embassies_consulates.yml

License

BaseMax/us-embassies-consulates

Folders and files

Latest commit

History

Repository files navigation

US Embassies & Consulates Crawler

Features

Usage

Project Structure

Customization

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages