A comprehensive research tool that gathers rich, multi-source information about Egyptian archaeological sites. Unlike simple web scrapers, it treats each site as a research subject, synthesizing data from multiple authoritative sources.
Research-Oriented Multi-Source Architecture with Standard Python Packaging
| Feature | Description |
|---|---|
| Multi-Source Research | Aggregates data from 4+ authoritative sources per site |
| Wikipedia Integration | EN + AR Wikipedia with fuzzy search for name variations |
| 27 Governorates | Accurate mapping to all Egyptian governorates via Nominatim |
| Dynamic Arabic Content | Site-specific vocabulary with translations & pronunciations |
| No API Keys Required | All free, publicly accessible data sources |
| Page Type | Sites |
|---|---|
| Archaeological Sites | 34 |
| Monuments | 123 |
| Museums | 24 |
| Sunken Monuments | 8 |
| Total | 189 |
┌─────────────────────────────────────────────────────────────────┐
│ Site Researcher (Orchestrator) │
└─────────────────────────────┬───────────────────────────────────┘
│
┌─────────┬───────────┼───────────┬─────────┐
▼ ▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│Primary │ │Wikipedia│ │Governorate│ │Arabic │ │ Tips │
│Source │ │Research │ │Service │ │Extractor│ │Research│
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
egymonuments Wikipedia Nominatim Google Official
.gov.eg EN + AR Geocoding Translate Sources
# Clone repository
git clone https://github.com/Naareman/UnlockEgyptParser.git
cd UnlockEgyptParser
# Install with uv
uv sync
# Run
uv run unlockegypt# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
# Run
unlockegypt# After installation, use the CLI command:
unlockegypt # Research all sites (189 total)
unlockegypt -t monuments # Research specific page type
unlockegypt -t museums -m 5 # Limit sites per type
unlockegypt -o my_research.json # Custom output path
unlockegypt -v # Verbose logging
unlockegypt --no-headless # Show browser window
# Or run as module:
python -m unlockegypt.cli
python -m unlockegypt.cli -t monuments -m 5| Option | Description |
|---|---|
-t, --type |
Page type(s) to research (repeatable) |
-o, --output |
Output JSON file path |
-m, --max-sites |
Maximum sites per page type |
-v, --verbose |
Enable debug logging |
--no-headless |
Show browser window |
{
"sites": [{
"id": "site_001",
"name": "Karnak Temple",
"arabicName": "معبد الكرنك",
"governorate": "Luxor",
"era": "New Kingdom",
"uniqueFacts": ["Largest ancient religious site..."],
"keyFigures": ["Ramesses II", "Amenhotep III"],
"wikipediaUrl": "https://en.wikipedia.org/wiki/Karnak"
}],
"subLocations": [...],
"cards": [...],
"tips": [...],
"arabicPhrases": [...]
}| Source | Data Retrieved |
|---|---|
| egymonuments.gov.eg | Primary info, images, Arabic names |
| Wikipedia (EN/AR) | Historical facts, key figures, features |
| Nominatim/OSM | Coordinates, governorate mapping |
| Google Translate | Arabic vocabulary translations |
UnlockEgyptParser/
├── pyproject.toml # Project config & dependencies
├── config.yaml # Runtime configuration
├── .pre-commit-config.yaml # Code quality hooks
│
├── src/unlockegypt/ # Main package (src layout)
│ ├── __init__.py # Package exports
│ ├── cli.py # CLI entry point
│ ├── site_researcher.py # Main orchestrator
│ ├── models/ # Data models
│ ├── researchers/ # Research components
│ │ ├── wikipedia.py # Wikipedia API + fuzzy search
│ │ ├── governorate.py # 27 governorate mapping
│ │ ├── arabic_terms.py # Vocabulary extraction
│ │ ├── tips.py # Visitor tips
│ │ └── google_maps.py # Practical info
│ └── utils/ # Utilities
│ └── config.py # Configuration loader
│
├── tests/ # Test suite
│ ├── conftest.py
│ └── test_models.py
│
└── docs/ # Documentation
├── PRD.md # Product requirements
├── DESIGN.md # System design
└── TECH_STACK.md # Technology stack
# Install with dev dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install# Run linter
ruff check .
# Run formatter
ruff format .
# Run type checker
mypy .
# Run tests
pytest
# Run tests with coverage
pytest --cov=. --cov-report=htmlThe project uses pre-commit hooks for:
- Trailing whitespace removal
- YAML/JSON validation
- Ruff linting and formatting
- MyPy type checking
- Security checks (bandit)
All settings are in config.yaml:
website:
base_url: "https://egymonuments.gov.eg"
timing:
page_load_wait: 5
scroll_wait: 2
geocoding_rate_limit: 1
browser:
headless: true
window_size: [1920, 1080]MIT License - see LICENSE for details.
- UnlockEgypt iOS App - The mobile app that uses this data