Skip to content

Akshitha0118/MLOps-for-Imbalanced-Classification-with-MLflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mlops-2

Imbalanced Classification with MLflow Experiment Tracking 🚀

This project demonstrates how different machine learning models perform on an imbalanced binary classification problem, and how to track, compare, and manage experiments using MLflow.

📌 Problem Statement

Real-world datasets are often imbalanced, where one class heavily dominates the other. Traditional accuracy metrics can be misleading in such cases.
This project focuses on:

  • Handling class imbalance
  • Comparing multiple ML models
  • Evaluating performance using appropriate metrics
  • Tracking experiments using MLflow

🧪 Dataset

  • Synthetic dataset generated using make_classification
  • Samples: 1000
  • Class distribution: 90% Class 0, 10% Class 1
  • Features: 10 (2 informative, 8 redundant)

⚙️ Models Trained

  1. Logistic Regression
  2. Random Forest Classifier
  3. XGBoost Classifier
  4. XGBoost + SMOTETomek (Imbalance Handling)

📊 Evaluation Metrics

To properly assess imbalanced data, the following metrics were used:

  • Accuracy
  • Recall (Class 1 – Minority Class)
  • Recall (Class 0 – Majority Class)
  • Macro F1-Score

📈 Model Performance Summary

Model Accuracy Recall (Class 1) Recall (Class 0) Macro F1
Logistic Regression 0.9167 0.50 0.963 0.7498
Random Forest 0.9667 0.70 0.996 0.8947
XGBoost 0.9767 0.80 0.996 0.9299
XGBoost + SMOTETomek 0.9567 0.8333 0.970 0.8847

🔍 Key Insights

  • Accuracy alone is misleading for imbalanced datasets.
  • XGBoost achieved the best overall performance.
  • SMOTETomek improved minority class recall, making it suitable when false negatives are costly.
  • MLflow makes experiment comparison transparent and reproducible.

🧠 Experiment Tracking with MLflow

  • Logged parameters, metrics, and models
  • Compared multiple runs visually
  • Enabled reproducibility and model versioning

🛠️ Tech Stack

  • Python
  • Scikit-learn
  • XGBoost
  • Imbalanced-learn
  • MLflow
  • NumPy

▶️ How to Run

pip install -r requirements.txt
mlflow ui
python train.py

About

MLOps project demonstrating experiment tracking and model versioning on imbalanced data using MLflow.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors