This project demonstrates how different machine learning models perform on an imbalanced binary classification problem, and how to track, compare, and manage experiments using MLflow.
Real-world datasets are often imbalanced, where one class heavily dominates the other. Traditional accuracy metrics can be misleading in such cases.
This project focuses on:
- Handling class imbalance
- Comparing multiple ML models
- Evaluating performance using appropriate metrics
- Tracking experiments using MLflow
- Synthetic dataset generated using
make_classification - Samples: 1000
- Class distribution: 90% Class 0, 10% Class 1
- Features: 10 (2 informative, 8 redundant)
- Logistic Regression
- Random Forest Classifier
- XGBoost Classifier
- XGBoost + SMOTETomek (Imbalance Handling)
To properly assess imbalanced data, the following metrics were used:
- Accuracy
- Recall (Class 1 – Minority Class)
- Recall (Class 0 – Majority Class)
- Macro F1-Score
| Model | Accuracy | Recall (Class 1) | Recall (Class 0) | Macro F1 |
|---|---|---|---|---|
| Logistic Regression | 0.9167 | 0.50 | 0.963 | 0.7498 |
| Random Forest | 0.9667 | 0.70 | 0.996 | 0.8947 |
| XGBoost | 0.9767 | 0.80 | 0.996 | 0.9299 |
| XGBoost + SMOTETomek | 0.9567 | 0.8333 | 0.970 | 0.8847 |
- Accuracy alone is misleading for imbalanced datasets.
- XGBoost achieved the best overall performance.
- SMOTETomek improved minority class recall, making it suitable when false negatives are costly.
- MLflow makes experiment comparison transparent and reproducible.
- Logged parameters, metrics, and models
- Compared multiple runs visually
- Enabled reproducibility and model versioning
- Python
- Scikit-learn
- XGBoost
- Imbalanced-learn
- MLflow
- NumPy
pip install -r requirements.txt
mlflow ui
python train.py