Skip to content

oberoiharshith/intelligent-claims-processing-poc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pet Insurance Claims AI Automation (Rule + ML + Document Intelligence) – Production-minded POC

End-to-end applied AI system for automated claims decisioning:

• Ground-truth dataset generation (SQL + DuckDB)
• Coverage rule engine + ML approval model
• Invoice text → structured feature extraction
• FastAPI real-time scoring service
• Metrics endpoint for operational monitoring

This project mirrors a production Data Scientist workflow for modernizing the claims lifecycle — from raw data → validated gold layer → trained model → explainable API decision.


What this demonstrates

  • Ground truth & trusted reporting: SQL-first gold dataset with validation checks
  • Structured + unstructured ML features: policy data + invoice text extraction
  • Decision automation: rule system + probability-based ML recommendations
  • Full model lifecycle: train → evaluate → versioned artifact
  • Production mindset: low-latency API with operational metrics
  • Explainability: human-readable decision reasoning

3-minute demo flow

  1. Generate synthetic claims and invoices
  2. Build analytics-ready gold dataset
  3. Train approval model
  4. Start FastAPI service
  5. Submit a claim → receive decision, confidence, payout, and explanation

Quick start

1) Create environment

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2) Generate data + build gold dataset

python -m src.data.generate_synthetic_data --out-dir data/raw --n 2000
python -m src.data.build_gold_dataset --raw-dir data/raw --db-path data/warehouse.duckdb

3) Train model

python -m src.models.train --db-path data/warehouse.duckdb --model-dir artifacts/model

4) Run API

uvicorn src.api.app:app --reload --port 8000

Open:

Swagger UI http://127.0.0.1:8000/docs

Metrics endpoint http://127.0.0.1:8000/metrics


Example request

curl -X POST "http://127.0.0.1:8000/submit-claim" \
-H "Content-Type: application/json" \
-d '{
  "claim_id": "CLM-NEW-001",
  "policy_id": "POL-00010",
  "pet_id": "PET-00010",
  "invoice_text": "Date: 2026-02-02\nProcedure: XRAY\nDiagnosis: BACK_PAIN\nTotal: $350\n",
  "claimed_amount": 350
}'

Demo – Automated Claims Decisioning

Same treatment → different policy → different outcome

BASIC policy → high-confidence deny

Deny

PREMIUM policy → approve with computed payout

Approve

Model performance

Metrics

The system combines deterministic coverage rules with an ML approval model to automate claim decisions. Covered procedures under higher-tier policies are recommended for approval with a calculated reimbursement, while non-covered scenarios produce high-confidence denials with clear reasoning.


Business impact (simulated workflow)

  • Consistent, explainable claim decisions
  • Reduced manual review for high-confidence cases
  • Trusted “ground truth” layer for reporting and model evaluation
  • Real-time decision support for operations teams

Repo layout

src/
  api/            FastAPI service + metrics
  data/           synthetic generator + gold builder (DuckDB)
  decisioning/    rule + model decision engine
  models/         train + predict helpers
  nlp/            invoice extractor
  evaluation/     metrics helpers

Notes

  • Designed to be small, runnable, and demo-friendly
  • DuckDB simulates a cloud warehouse for local development
  • Synthetic data used to mirror real claims workflows

About

Production-style ML system for automated insurance claims decisions using a SQL gold dataset, rule + model scoring, FastAPI inference, and explainable outputs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors