A scalable, production-ready machine learning system for detecting fraudulent credit card transactions in real time.
The project focuses on handling extreme class imbalance, optimizing precision-recall tradeoffs, and providing model explainability for decision transparency.
Credit card fraud detection presents key challenges:
- Highly imbalanced datasets (fraud cases are rare)
- Need for low-latency predictions in real-time systems
- Requirement for interpretable models in financial applications
This system addresses these challenges using a robust ML pipeline and API-based deployment.
Raw Data → Preprocessing → SMOTE → Model Training (XGBoost)
→ Hyperparameter Tuning (Optuna)
→ Evaluation (Precision-Recall Optimization)
→ Explainability (SHAP)
→ Deployment (FastAPI)
| Metric | Score |
|---|---|
| Precision | 93% |
| Recall | 85% |
- Optimized using precision-recall tradeoff
- Focused on minimizing false positives while maintaining strong recall
-
Imbalance Handling
- SMOTE-based oversampling for minority class
-
Model Optimization
- Hyperparameter tuning using Optuna
-
Explainability
- SHAP for feature-level prediction insights
-
Real-Time Inference
- FastAPI-based REST API for low-latency predictions
-
Production-Oriented Design
- Modular code structure
- Model serialization using joblib
- Languages: Python
- ML: XGBoost, Scikit-learn
- Data: Pandas, NumPy
- Imbalance Handling: imbalanced-learn (SMOTE)
- Optimization: Optuna
- Explainability: SHAP
- Backend: FastAPI
- Serving: Uvicorn
.
├── api/
│ └── app.py # FastAPI application
├── models/
│ └── model.pkl # Trained model
├── src/ # Core ML pipeline
├── notebooks/ # Experiments and analysis
├── requirements.txt
└── README.md
git clone <your-repo-url>
cd fraud-detection
pip install -r requirements.txt
uvicorn api.app:app --reload
POST /predict
{
"features": [0.1, -1.2, 3.4, ...]
}{
"fraud": true,
"probability": 0.92
}- Uses SHAP (SHapley Additive Explanations)
- Provides per-transaction feature importance
- Enables:
- Debugging model predictions
- Auditability for financial systems
- Real-time streaming (Kafka / Flink)
- Model monitoring and drift detection
- Docker containerization
- Kubernetes deployment
- Feature store integration
- CI/CD pipelines
Utsav Kashyap