A production-ready B2B fintech solution that solves data aggregation and normalization for credit scoring and cash flow visibility.
Financial institutions struggle to consolidate fragmented data from multiple sources. This solution:
- Ingests messy CSV files and API data.
- Normalizes them into a canonical schema using self-healing logic.
- Detects Anomalies automatically to flag data quality issues.
- Exposes standardized data via REST APIs for downstream credit scoring.
- Backend: FastAPI (Python 3.11), Pandas, Pydantic, SQLAlchemy
- Frontend: React, Vite, TailwindCSS, Lucide-React
- Database: PostgreSQL (SQLAlchemy models ready for AWS RDS)
- Infrastructure: Terraform (AWS VPC, RDS, Lambda)
- CI/CD: GitHub Actions (.github/workflows/deploy.yml)
[ Data Sources ] --(CSV/API)--> [ FastAPI Ingestion ]
| (Background)
v
[ Normalization Engine ]
| (Self-Healing)
+--> [ Pydantic Validator ]
+--> [ Date/Amount Cleaner ]
v
[ Monitoring Dashboard ] <--(API)-- [ PostgreSQL Storage ]
- Flexible Date Parsing: Automatically detects and corrects 5+ different date formats.
- Currency Cleaning: Handles symbols ($, €), commas, and whitespace in amount fields.
- Anomaly Detection: Flags records with missing descriptions, zero amounts, or unparseable dates.
- Retries & DLQ: Simulated in this MVP, ready for AWS Step Functions.
cd api
pip install -r requirements.txt
uvicorn main:app --reloadcd frontend
npm install
npm run dev- Open
http://localhost:5173(Frontend) - Click "Import CSV" and select
sample_messy_data.csv(provided in root) - Watch the Real-time Flow and Anomaly Trace update as data is normalized.
POST /ingest/file: Upload a CSV for normalization. Returnstask_id.GET /tasks/{task_id}: Monitor ingestion status.GET /transactions: Retrieve latest normalized transactions.GET /anomalies: Get list of records flagged by the engine.GET /monitor/stats: Get system operational health.
Terraform files are located in /terraform. To deploy to AWS:
cd terraform
terraform init
terraform apply -auto-approve