Version: 3.4 Last Updated: 2026-02-01 Status: Active Development
UnlockEgypt Site Researcher is a comprehensive research tool that gathers rich, multi-source information about Egyptian archaeological sites. Unlike simple web scrapers, it treats each site as a research subject, synthesizing data from multiple authoritative sources to create comprehensive site profiles.
Travelers and researchers seeking information about Egyptian archaeological sites face fragmented data across multiple sources:
- Official government sites (egymonuments.gov.eg) have authoritative but sometimes incomplete data
- Wikipedia provides historical context but may lack practical visitor information
- Google Maps has operational details but lacks historical depth
- No single source provides Arabic translations, pronunciation guides, and cultural context
A research-oriented tool that:
- Aggregates data from multiple authoritative sources
- Provides comprehensive site profiles with historical and practical information
- Includes Arabic vocabulary with pronunciation guides
- Generates contextual visitor tips based on site characteristics
| User Type | Needs | Usage Pattern |
|---|---|---|
| Travel App Developers | Structured JSON data for mobile apps | One-time data export, periodic updates |
| Tourism Researchers | Comprehensive site information | Research and analysis |
| Educational Platforms | Historical facts, Arabic terms | Content creation |
As a travel app developer, I want structured JSON data about Egyptian sites so that I can populate my mobile application with rich content.
As a researcher, I want to gather comprehensive information from multiple sources so that I can analyze patterns across sites.
As a content creator, I want accurate Arabic translations and pronunciations so that I can create educational materials.
| Feature | Description | Priority | Status |
|---|---|---|---|
| Multi-Source Research | Aggregate data from 4+ sources per site | P0 | Done |
| Wikipedia Integration | EN + AR Wikipedia with fuzzy search | P0 | Done |
| Governorate Detection | Accurate mapping to 27 Egyptian governorates | P0 | Done |
| Arabic Vocabulary | Dynamic extraction with translations | P0 | Done |
| Contextual Tips | Site-type-specific visitor recommendations | P0 | Done |
| JSON Export | Structured output for app consumption | P0 | Done |
For each site, the system collects:
Basic Information:
- Name (English + Arabic)
- Governorate location
- GPS coordinates
- Historical era
- Tourism type (Pharaonic, Islamic, Coptic, etc.)
- Place type (Temple, Tomb, Museum, etc.)
Rich Content:
- Short and full descriptions
- Unique historical facts (from Wikipedia)
- Key historical figures mentioned
- Architectural features
- Multiple images
Practical Information:
- Estimated visit duration
- Best time to visit
- Opening hours (when available)
- Official website links
- Contextual visitor tips
Cultural Content:
- Arabic vocabulary terms
- Pronunciation guides
- Site-specific Arabic phrases
| Category | Example Sites | Count |
|---|---|---|
| Archaeological Sites | Karnak, Abu Simbel | ~34 |
| Monuments | Pyramids, Sphinx | ~123 |
| Museums | Egyptian Museum, GEM | ~24 |
| Sunken Monuments | Alexandria underwater sites | ~8 |
| Total | ~189 |
| Metric | Requirement |
|---|---|
| Sites per hour | 20-30 (with rate limiting) |
| Memory usage | < 500MB during operation |
| Output file size | Scalable JSON |
- Graceful handling of missing data
- Retry logic for network failures
- Comprehensive error logging
- Cache management to prevent memory leaks
- Modular researcher components
- Externalized configuration (config.yaml)
- Type hints throughout codebase
- Comprehensive documentation
- No API Keys Required: All data sources must be freely accessible
- Rate Limiting: Respect source website limits (1 req/sec for Nominatim)
- Ethical Scraping: Proper user-agent identification
| Source | Data Retrieved | Method |
|---|---|---|
| egymonuments.gov.eg | Primary site info, images | Selenium |
| Wikipedia (EN) | Historical facts, key figures | Wikipedia API |
| Wikipedia (AR) | Arabic names, descriptions | Wikipedia API |
| Nominatim/OSM | Coordinates, governorate | REST API |
| Google Translate | Arabic translations | deep-translator |
| Metric | Target | Current |
|---|---|---|
| Sites with complete data | > 90% | TBD |
| Sites with Arabic content | > 95% | TBD |
| Sites with unique facts | > 80% | TBD |
| Wikipedia match rate | > 70% | TBD |
- Google Maps integration for opening hours
- Ticket pricing from official sources
- Image optimization and CDN upload
- Incremental update support
- Multi-language support beyond EN/AR
- Audio pronunciation files
- Interactive map generation
- API endpoint for real-time queries
| Term | Definition |
|---|---|
| Governorate | Administrative division of Egypt (27 total) |
| Pharaonic | Related to ancient Egyptian pharaohs |
| Fuzzy Search | Search that finds results despite spelling variations |
| Rate Limiting | Restricting request frequency to avoid overloading servers |