Every vote. Every deputy. In plain French.
MonÉlu is a civic transparency platform that makes the voting record of every deputy in the French Assemblée Nationale fully accessible — in plain language, in real time. Built for journalists, researchers, and engaged citizens who shouldn't need to dig through government ZIP exports to understand how their representatives vote.
Live: https://monelu-production.up.railway.app · API docs: /docs
| Phase | Status | What it covers |
|---|---|---|
| Phase 1 — Data platform | Live | Full ingestion pipeline, REST API, deputy profiles, vote records, scorecards |
| Phase 2 — Intelligence layer | Live | Semantic search over the legislative corpus (RAG, pgvector, Groq LLM) |
| Phase 3 — Pipeline infrastructure | In progress | Production-grade data orchestration and automated refresh pipelines |
make airflow-up # Start Airflow (webserver + scheduler)
make minio-up # Start MinIO (local S3)
make setup-minio # Create Bronze buckets
make airflow-ui # Open Airflow at localhost:8080
make minio-ui # Open MinIO at localhost:9001| DAG | Schedule | Description |
|---|---|---|
deputies_incremental |
Weekly Mon 6am | Deputies → GE → Bronze → Postgres |
votes_batch |
Every 2h weekdays | Votes → GE → Bronze → Postgres → positions |
GitHub Actions runs ingestion every 6 hours on weekdays. Trigger manually: GitHub → Actions → MonÉlu Production Ingestion → Run workflow
Required secrets (Settings → Secrets and variables → Actions):
DATABASE_URL · AN_API_BASE_URL · OPENAI_API_KEY · GROQ_API_KEY
Raw data lands in MinIO at s3://monelu-bronze/{entity}/year=Y/month=M/day=D/
Hash-based change detection — skips write if data unchanged since last run.
Assemblée Nationale Open Data (ZIP exports)
→ Ingestion pipeline (fetch · parse · upsert with retry)
→ PostgreSQL + pgvector on Supabase (deputies · votes · positions · embeddings)
→ FastAPI on Railway (stateless, auto-restart)
→ JSON API / HTML landing page / POST /search (RAG)
The API tier is fully stateless. All state lives in Supabase (managed Postgres with pgvector). Railway restarts the service on failure; the health endpoint returns live DB counts on every check.
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Landing page — live stats, latest votes, RAG demo |
| GET | /deputies |
List all deputies (search, department filters) |
| GET | /deputies/{id} |
Deputy profile |
| GET | /deputies/{id}/scorecard |
Presence rate, vote breakdown by position |
| GET | /votes |
List votes (result filter) |
| GET | /votes/latest |
Last 10 votes |
| GET | /votes/{id} |
Vote detail + all individual positions |
| GET | /health |
API status + live record counts |
| POST | /search |
Natural language query over the legislative corpus (Phase 2) |
Implemented with slowapi, keyed by remote IP.
| Scope | Limit |
|---|---|
| Global default | 30 req / min |
GET /deputies/{id}/scorecard |
10 req / min |
On limit exceeded: HTTP 429 · {"error": "Too Many Requests", "detail": "..."} · Retry-After + X-RateLimit-* headers.
Core: FastAPI · PostgreSQL + pgvector (Supabase) · Python 3.11 · Railway · slowapi
Phase 2: OpenAI text-embedding-3-small · Groq llama-3.3-70b-versatile · tiktoken · MLflow
Phase 3: Apache Airflow 2.8 · MinIO (S3-compatible Bronze) · Great Expectations 0.18 · GitHub Actions
Code quality: ruff (lint + format) · pre-commit
Assemblée Nationale Open Data — data.assemblee-nationale.fr
Static ZIP exports only (no REST API available from the source).
| Dataset | File |
|---|---|
| Deputies + organes (active, 17th legislature) | AMO10_deputes_actifs_mandats_actifs_organes.json.zip |
| Votes (scrutins, since 2025-07-01) | Scrutins.json.zip |
- Docker + Docker Compose
- Python 3.11+
git clone <repo> && cd MonElu
cp .env.example .env # set DATABASE_URL, OPENAI_API_KEY, GROQ_API_KEY
python3 -m venv venv
venv/bin/pip install -r requirements.txt
make start # start local Postgres
make migrate # apply schema
make ingest # deputies → votes → positions
make fix-deputies
make api # → http://localhost:8000/docsmake start docker compose up -d
make stop docker compose down
make migrate apply 001_init.sql to DATABASE_URL
make ingest full local ingestion (deputies → votes → positions)
make ingest-prod production ingestion (--since 2025-01-01)
make fix-deputies resolve party names + expand department codes
make api uvicorn api.main:app --reload
make psql psql into the running Postgres container
make check-db table sizes, row counts, pgvector status
make rag-index truncate + re-embed all chunks (~$0.006)
make rag-stats chunk counts by type
make rag-clear truncate document_chunks
make rag-test run 3 sample queries end-to-end
make rag-eval MLflow k=3 vs k=5 evaluation
make mlflow-ui MLflow dashboard at http://localhost:5001
make airflow-up start Airflow webserver + scheduler
make airflow-down stop all Airflow services
make airflow-logs tail scheduler logs
make airflow-ui Airflow UI at http://localhost:8080
make minio-up start MinIO
make minio-ui MinIO console at http://localhost:9001
make setup-minio create Bronze buckets (monelu-bronze, monelu-checkpoints)
make dag-deputies manually trigger deputies_incremental DAG
make dag-votes manually trigger votes_batch DAG
| Column | Type | Notes |
|---|---|---|
deputy_id |
TEXT PK | AN uid, e.g. PA1592 |
full_name |
TEXT | |
first_name / last_name |
TEXT | |
party |
TEXT | Full GP name, e.g. Rassemblement National |
party_short |
TEXT | organeRef, e.g. PO845401 |
circonscription / department |
TEXT | Full name, e.g. Yvelines |
mandate_start / mandate_end |
DATE | mandate_end is null if active |
photo_url |
TEXT | Official portrait from assemblee-nationale.fr |
| Column | Type | Notes |
|---|---|---|
vote_id |
TEXT PK | e.g. VTANR5L17V1234 |
voted_at |
TIMESTAMPTZ | |
vote_title |
TEXT | Full legislative title |
vote_type |
TEXT | e.g. SPO |
result |
TEXT | adopté or rejeté |
votes_for / votes_against / abstentions / total_voters |
INTEGER | |
dossier_id |
TEXT | Linked dossier, if any |
| Column | Type | Notes |
|---|---|---|
position_id |
BIGSERIAL PK | |
vote_id |
TEXT FK → votes | |
deputy_id |
TEXT FK → deputies | |
position |
VARCHAR(15) | pour / contre / abstention / nonVotant |
| Column | Type | Notes |
|---|---|---|
id |
BIGSERIAL PK | |
content |
TEXT | French prose chunk |
metadata |
JSONB | chunk_type, vote_id or deputy_id, etc. |
embedding |
vector(1536) | OpenAI text-embedding-3-small |
| Module | Purpose |
|---|---|
main.py |
App entry point — CORS, rate limiting, exception handlers, landing page, health check |
limiter.py |
Shared slowapi Limiter instance |
routers/deputies.py |
Deputy list, profile, and scorecard endpoints |
routers/votes.py |
Vote list, latest, and detail endpoints |
routers/search.py |
POST /search — RAG query endpoint |
schemas.py |
Pydantic response models (all fields Optional to match DB NULLs) |
| Script | Purpose |
|---|---|
ingest_deputies.py |
Downloads AMO10 ZIP, upserts deputy profiles |
ingest_votes.py |
Downloads Scrutins ZIP, upserts votes (--since flag) |
ingest_positions.py |
Extracts individual deputy positions from Scrutins ZIP |
run_ingestion_prod.py |
Orchestrates the full pipeline with timing summary |
update_party.py |
Resolves GP party names and expands department codes |
migrate.py |
Applies 001_init.sql — also the Railway start hook |
check_db_size.py |
Prints table sizes and DB storage usage |
All scripts use exponential-backoff retry (5 attempts, 2 s base) and upsert via ON CONFLICT ... DO UPDATE.
ingestion/
├── dags/
│ ├── dag_deputies_incremental.py Weekly: fetch → GE validate → Bronze → Postgres
│ └── dag_votes_batch.py Bi-hourly: check session → fetch → GE → Bronze → Postgres → positions
├── operators/ (reserved for custom Airflow operators)
└── utils/
└── bronze_writer.py MinIO S3 writer — partitioned by date, hash-based deduplication
quality/
└── expectations/
├── deputies_suite.py GE suite: row count 500–600, uid not null
└── votes_suite.py GE suite: row count, required columns, dateScrutin not null
rag/
├── pipeline/
│ ├── chunker.py Five chunk strategies: vote, deputy, party, global_stats, notable_deputy
│ ├── embedder.py Batched OpenAI embedding (100 chunks/batch) → document_chunks
│ └── index_manager.py CLI: build / stats / clear
├── chain/
│ ├── retriever.py pgvector cosine similarity (ivfflat.probes=10, notable deputy pinning)
│ ├── prompts.py French civic assistant system prompt + RAG template
│ └── rag_chain.py ask() — retrieve → format → Groq LLM
└── experiments/
└── mlflow_eval.py 10 golden Q&A pairs, keyword scoring, k=3 vs k=5 experiment
Index stats: 3,741 chunks · avg 87 tokens · $0.0065 to embed
pip install pre-commit ruff
pre-commit install # runs automatically on every git commit| Hook | What it enforces |
|---|---|
trailing-whitespace |
No stray spaces at line ends |
end-of-file-fixer |
Files end with a newline |
check-yaml / check-json |
Syntax errors in config files |
check-merge-conflict |
No committed <<<<<< markers |
check-added-large-files |
Blocks files over 500 KB |
debug-statements |
Blocks breakpoint() / pdb.set_trace() |
ruff |
Lint + auto-fix (imports, bugbear patterns, isort) |
ruff-format |
Black-compatible formatting |
Lint config: ruff.toml — line length 100, T201 (print) allowed in scripts/ and rag/.
- CORS:
allow_credentials=False,allow_methods=["GET"]— public read-only API - Input validation:
limitcapped at 200,offsetat 100,000 on all list endpoints - Error handling: Global 500 handler returns a generic message — no tracebacks or DSNs in responses
- Rate limiting: 60 req/min global, 10 req/min on scorecard, by IP
- No secrets in git: All credentials via environment variables;
.envis gitignored
nonVotant≠abstention— present in chamber but did not vote; excluded frompresence_rate- Yaël Braun-Pivet at 100% presence — Présidente de l'AN, recorded on every scrutin by the AN data system
rejetéoutnumbersadopté— the 17th legislature has no stable majority- Party names — resolved from
Organes.jsonGP mandats; 575/577 deputies covered - Department names — full text (
"78"→"Yvelines") for all 96 metropolitan + DOM departments - Ingestion window — production DB holds votes from
2025-07-01(Supabase free tier); run--since 2024-07-07locally for the full legislature
All unhandled exceptions return a consistent envelope — never a traceback:
{"error": "Internal server error", "status": 500}Full stack traces are written to the server log (logging.error) and never exposed to clients.