This project implements a complete MLOps pipeline for a machine learning scikit-learn model using Python. The pipeline includes the following steps:
- Data Gathering: Fetches data from the UCI ML repository.
- Data Analysis: Performs exploratory data analysis (EDA) to understand the dataset.
- Data Versioning: Saves the dataset for version control. (TODO wiwth DVC)
- Data Preparation: Prepares the data for modeling through feature engineering and data splitting.
- Model Training & Development: Trains a RandomForestClassifier on the prepared data.
- Model Validation: Validates the model using accuracy metrics and other evaluation tools.
- Model Serving: Saves the trained model for deployment. (TODO)
- Model Monitoring: Logs predictions and tracks model performance over time. (TODO)
ml_pipeline/
├── data/
│ ├── raw/
│ └── prepared/
├── src/
│ ├── data/
│ └── model/
├── artifacts/
├── requirements.txt
└── README.md
-
Clone the repository:
git clone https://github.com/burna680/MLFlow_demo.git cd MLFlow_demo -
Create a virtual environment (optional but recommended):
python3 -m venv venv source venv/bin/activate -
Install the dependencies:
pip install -r requirements.txt
-
Start the MLflow server (if you haven't already):
mlflow ui
Run the main script to execute the entire pipeline:
python main.pyThis will perform all steps of the pipeline, from data gathering to model training. Outputs, logs, and saved models will be stored in the appropriate directories under model/, and artifacts/.
To use the mlflow model, use the serve command to start the model server. The command depends on your available ports in your local machine and the specific model run you want to serve:
mlflow model serve --model-uri <model_uri> --port=<available_port> --no-conda- Data Modules: Located under
src/data/, these modules handle everything from gathering and preparing data to versioning it using MLflow. - Model Modules: Located under
src/model/, these modules are responsible for training, validating, serving, monitoring, and retraining the model. - Utilities: Common and useful functions for the MLFlow project can be placed in
src/utils.pyto keep the code DRY.
- CI/CD Integration: Add continuous integration and continuous deployment pipelines.
- Model Deployment: Implement model deployment using tools like Flask or FastAPI.
- Advanced Monitoring: Incorporate advanced monitoring and alerting mechanisms.
This project is licensed under the MIT License. See the LICENSE file for details.
