End-to-end applied AI system for automated claims decisioning:
• Ground-truth dataset generation (SQL + DuckDB)
• Coverage rule engine + ML approval model
• Invoice text → structured feature extraction
• FastAPI real-time scoring service
• Metrics endpoint for operational monitoring
This project mirrors a production Data Scientist workflow for modernizing the claims lifecycle — from raw data → validated gold layer → trained model → explainable API decision.
- Ground truth & trusted reporting: SQL-first gold dataset with validation checks
- Structured + unstructured ML features: policy data + invoice text extraction
- Decision automation: rule system + probability-based ML recommendations
- Full model lifecycle: train → evaluate → versioned artifact
- Production mindset: low-latency API with operational metrics
- Explainability: human-readable decision reasoning
- Generate synthetic claims and invoices
- Build analytics-ready gold dataset
- Train approval model
- Start FastAPI service
- Submit a claim → receive decision, confidence, payout, and explanation
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtpython -m src.data.generate_synthetic_data --out-dir data/raw --n 2000
python -m src.data.build_gold_dataset --raw-dir data/raw --db-path data/warehouse.duckdbpython -m src.models.train --db-path data/warehouse.duckdb --model-dir artifacts/modeluvicorn src.api.app:app --reload --port 8000Open:
Swagger UI http://127.0.0.1:8000/docs
Metrics endpoint http://127.0.0.1:8000/metrics
curl -X POST "http://127.0.0.1:8000/submit-claim" \
-H "Content-Type: application/json" \
-d '{
"claim_id": "CLM-NEW-001",
"policy_id": "POL-00010",
"pet_id": "PET-00010",
"invoice_text": "Date: 2026-02-02\nProcedure: XRAY\nDiagnosis: BACK_PAIN\nTotal: $350\n",
"claimed_amount": 350
}'The system combines deterministic coverage rules with an ML approval model to automate claim decisions. Covered procedures under higher-tier policies are recommended for approval with a calculated reimbursement, while non-covered scenarios produce high-confidence denials with clear reasoning.
- Consistent, explainable claim decisions
- Reduced manual review for high-confidence cases
- Trusted “ground truth” layer for reporting and model evaluation
- Real-time decision support for operations teams
src/
api/ FastAPI service + metrics
data/ synthetic generator + gold builder (DuckDB)
decisioning/ rule + model decision engine
models/ train + predict helpers
nlp/ invoice extractor
evaluation/ metrics helpers
- Designed to be small, runnable, and demo-friendly
- DuckDB simulates a cloud warehouse for local development
- Synthetic data used to mirror real claims workflows


