Extract real-world locations from a news article (URL or pasted text), geocode them with OpenStreetMap’s Nominatim, and visualize results on an interactive Folium map inside a Streamlit app.
Paste a URL or raw text → Gemini finds place names → Nominatim resolves coordinates → Folium map displays results.
- Features
- Project Layout
- Prerequisites
- Setup
- Run the App
- How It Works
- Configuration Notes
- Troubleshooting
- Article ingestion from URL or pasted text
- Location extraction with Google Gemini (
gemini-2.5-pro) - Geocoding via OpenStreetMap Nominatim (rate-limited & cached)
- Interactive map with Folium rendered in Streamlit
- CLI mode for quick non-UI runs
src/
app.py # Streamlit app wrapper
main_pipeline.py # Orchestrates text → locations → coords → map
article_text_extractor.py # URL → article text
nlp_loc_extractor.py # Gemini call → structured locations
geocode_loc_finder.py # Locations → coordinates (Nominatim)
map_viz.py # Folium map creation & saving
.streamlit/
secrets.toml # Streamlit secrets (GEMINI_API_KEY lives here)
.vscode/
settings.json # Optional editor config
.gitignore
.env # Local dev env vars (GEMINI_API_KEY lives here)
geocode_cache.sqlite # Auto-created by requests-cache
interactive_map.html # Optional map export
readme.md
requirements.txt
- Python 3.9+ (3.10/3.11 recommended)
- A Google AI Studio API key for Gemini
- Internet access (for article fetches, Gemini API, and Nominatim)
Please respect the OpenStreetMap/Nominatim usage policy. This app already rate-limits to ~1 request/second.
pip install --upgrade pip
pip install -r requirements.txtYou can set the same key in both places below. Streamlit uses secrets.toml; the CLI reads .env / environment variables.
Option A — .env at repo root (for local dev & CLI)
GEMINI_API_KEY="your_api_key_here"
Option B — Streamlit secrets (.streamlit/secrets.toml)
GEMINI_API_KEY = "your_api_key_here"app.py reads st.secrets["GEMINI_API_KEY"].
main_pipeline.py falls back to the GEMINI_API_KEY environment variable (loaded via python-dotenv) if no key is passed in.
From the project root (with the venv activated):
streamlit run src/app.pyProcess a URL:
python src/main_pipeline.py --url "https://example.com/news/article"Process pasted text:
python src/main_pipeline.py --text "Paste the full article text here"Provide an API key explicitly (optional if using .env/secrets):
python src/main_pipeline.py --api-key "your_api_key_here" --url "https://example.com/news/article"ArticleLocationExtractor(src/main_pipeline.py)- Initializes the Gemini client (default model:
gemini-2.5-pro). - Configures Nominatim with
RateLimiter(min_delay_seconds=1.0). - Enables requests-cache with a 7-day expiry (
geocode_cache.sqlite). - Pipeline:
- URL → text (
article_text_extractor.extract_article_text) - text → locations (
nlp_loc_extractor.extract_locations_with_gemini) - locations → lat/lon (
geocode_loc_finder.geocode_nominatim) - data → map (
map_viz.create_styled_map, rendered in Streamlit)
- URL → text (
- Initializes the Gemini client (default model:
If a site blocks scraping (403 Forbidden), use Paste text mode.
- Model: Defaults to
gemini-2.5-pro. You can pass a differentmodel_namewhen creatingArticleLocationExtractor. - Caching:
geocode_cache.sqliteis created automatically. Delete it to force fresh geocoding. - Rate limiting: Keep the 1 req/sec rate to respect Nominatim.
- Map export: In CLI mode,
map_viz.save_open_map_in_browsermay save/openinteractive_map.html.
-
ValueError: Gemini API key not provided
Ensure.envor.streamlit/secrets.tomlis set, or pass--api-key. -
403 Forbiddenwhen fetching a URL`
The site likely blocks bots. Use Paste text mode. -
ModuleNotFoundError
Re-install deps:pip install -r requirements.txt. Verify package names (google-genai,requests-cache,streamlit-folium, etc.). -
Map doesn’t render in Streamlit
Start withstreamlit run src/app.pyand ensurestreamlit-foliumis installed. -
Slow geocoding
Normal for first runs; cache speeds up repeats.