Complete end-to-end MLOps workflow demonstrating best practices for production ML systems: data versioning, experiment tracking, containerization, pipeline orchestration, hyperparameter optimization, CI/CD, model serving, and monitoring.
Complete end-to-end demonstration showing all MLOps components working together:
- β Docker services orchestration
- β DVC data versioning (push/pull)
- β MLflow experiment tracking
- β ZenML pipeline execution
- β Optuna hyperparameter optimization
- β Model deployment (V1 β V2 upgrade)
- β API inference testing
- β Model rollback demonstration
- β Monitoring dashboards (Grafana)
This MLOps Mini-Project aims to build a complete end-to-end workflow for machine learning operations, demonstrating industry best practices for production ML systems. The project implements a sentiment analysis use case on Reddit comments, showcasing how to manage the entire ML lifecycle from data versioning to model deployment and monitoring.
This project demonstrates a production-ready MLOps pipeline including:
- Code Management (Git): Version control with branching strategy and release tags
- Containerization (Docker/Docker Compose): Reproducible environments for training and serving
- Data Versioning (DVC): Track datasets without storing large files in Git
- Experiment Tracking (MLflow): Log parameters, metrics, and artifacts
- Pipeline Orchestration (ZenML): Automated ML pipelines with experiment tracking
- Hyperparameter Optimization (Optuna): Bayesian optimization for model tuning
- CI/CD Pipeline (GitLab CI): Automated testing, building, and deployment
- Model Serving (FastAPI): Production-ready inference API
- Monitoring (Prometheus + Grafana): Real-time metrics and dashboards
- Automated Retraining: Scheduled and conditional model retraining
This MLOps mini-project aims to build an end-to-end workflow including:
- Code management (Git): Clean repository structure with branches and tags
- Containerization (Docker / Docker Compose): Reproducible environments
- Data versioning (DVC): Track datasets with remote storage
- Experiment tracking (MLflow): Log experiments, parameters, and metrics
- ML Pipeline (ZenML): Orchestrate ML workflows with CI/CD/CT
- Hyperparameter optimization (Optuna): Automated model tuning
- Model deployment (API): Production inference service
- Monitoring & Retraining (Bonus): Real-time monitoring and automated retraining
- β Simple ML use case: CPU-friendly, quick to train (no GPU required)
- β Reproducibility: Reproducible training and inference (code + data + config)
- β Traceability: Full traceability with Git versions, data versions, MLflow runs, and pipeline executions
- β Deployment: Deploy inference API with version management (v1 β v2) and rollback capability
- β Public dataset: Lightweight, quickly trainable dataset
- β Baseline model: Simple but measurable baseline model
- β Primary metric: Defined primary metric (F1-Score for multi-class classification)
Our Implementation:
- Dataset: Reddit Sentiment Analysis (~37,000 comments)
- Baseline: RandomForest with Bag of Words (F1: 0.60)
- Production: LightGBM with TF-IDF + Optuna HPT (F1: 0.8446)
- Metric: F1-Score (Macro) as primary metric
- β Clean GitLab repository: Well-structured README, clear structure
- β
Branching strategy:
main/devbranches with feature branches - β Version tags: Tags for model versions (v1, v2)
Our Implementation:
- Protected
mainanddevbranches - Feature branches for development
- Version tags:
v1.0.0,v2.0.0,v2.1.0 - Git Flow workflow
- β Dockerfiles: Separate Dockerfiles for training and serving
- β Docker Compose: Complete stack orchestration for local execution
Our Implementation:
Dockerfile.train: Training containerDockerfile.serve: API serving containerfrontend/Dockerfile.frontend: Frontend containerdocker-compose.yaml: Full stack (MinIO, MLflow, ZenML, API, Monitoring)
- β DVC tracking: Dataset tracked with DVC (no large files in Git)
- β Remote storage: Functional DVC remote (push/pull)
- β Reproducibility proof: Demonstration of reproducibility
Our Implementation:
- MinIO (S3-compatible) as DVC remote
dvc.yamlpipeline definition- Push/pull demonstrations with screenshots
- Full reproducibility with
dvc repro
- β Baseline run: At least 1 baseline run
- β Comparable runs: Multiple comparable runs with simple variations
- β Artifact logging: Log parameters, metrics, and artifacts (models, figures, etc.)
Our Implementation:
- Baseline: RandomForest (F1: 0.60)
- Multiple model comparisons: LightGBM, XGBoost, RandomForest, LogisticRegression
- Full artifact logging: models, vectorizers, confusion matrices, evaluation reports
- MLflow UI with comparison views
- β ZenML pipeline: Complete pipeline (data β train β eval β export)
- β Multiple executions: Baseline + variations pipeline runs
- β Continuous Training (CT): Scheduled smoke tests and full training
Our Implementation:
- ZenML pipeline with 5 steps: data ingestion β preprocessing β training β evaluation β export
- Multiple pipeline runs with different model types
- GitLab CI scheduled jobs for CT (smoke test + full training)
- β Optuna study: Short study (5-10 trials) on several hyperparameters
- β Comparison: Simple comparison between baseline / variations / best Optuna run
Our Implementation:
- 10-trial Optuna study on LightGBM
- Comparison: Baseline (F1: 0.8306) vs Grid Search (F1: 0.8350) vs Optuna (F1: 0.8426)
- Hyperparameter importance analysis
- MLflow integration for tracking
- β CI pipeline: Minimum pipeline with tests/lint + build images + push to registry
- β Continuous Training (CT): Scheduled job running "smoke" test (epochs=1 or subset)
Our Implementation:
- 5-stage pipeline: test β build β push β deploy β smoke
- Linting (flake8, black, isort)
- Unit tests with coverage
- Docker image building and pushing
- Scheduled smoke test (1000 rows, n_estimators=10)
- Weekly full CT training
- β
Stable inference API: Independent API endpoint (e.g.,
/predict) - β Docker Compose deployment: Deploy via Docker Compose
- β Version simulation: Simulate v1 β v2 update + rollback (proof with tests/captures)
Our Implementation:
- FastAPI inference service with
/predictendpoint - Docker Compose deployment
- Version management: v1.0.0 (F1: 0.60) β v2.1.0 (F1: 0.84)
- Rollback demonstration with screenshots
- Health checks and metrics endpoints
- β Monitoring: Latency, request count, errors, custom metrics
- β Retraining: Simple triggering (scheduled or conditional)
Our Implementation:
- Prometheus metrics collection
- Grafana dashboards (request rate, latency, confidence distribution)
- Retrain service with scheduled (daily) and conditional triggers
- Accuracy threshold-based retraining
- β Repository link: GitHub/GitLab repository
- β
Dockerfiles:
Dockerfile.train,Dockerfile.serve,docker-compose.yaml - β
DVC configuration:
.dvcfiles,dvc.yaml+ push/pull proof - β MLflow captures: Experiment list + comparison screenshots
- β ZenML captures: Pipeline runs + DAG visualizations
- β
GitLab CI:
.gitlab-ci.ymlwith full pipeline - β Deployment demo: Inference testing + v1 β v2 update + rollback with screenshots
- β Documentation: Complete README with execution instructions, commands, and structure
Multi-class sentiment classification of Reddit comments into three categories:
- Negative (-1): Negative sentiment
- Neutral (0): Neutral sentiment
- Positive (1): Positive sentiment
- Source: Reddit Sentiment Analysis Dataset
- Size: ~37,000 comments
- Features: Text comments with sentiment labels
- Public & Lightweight: Suitable for quick training and experimentation
| Version | Model | Features | F1 Score | Accuracy | Status |
|---|---|---|---|---|---|
| v1.0.0 | RandomForest | Bag of Words (5000 features) | 0.60 | 0.67 | Baseline |
| v2.0.0 | LightGBM | TF-IDF (1-3 ngrams) + SMOTE | 0.84 | 0.84 | Production |
| v2.1.0 | LightGBM | TF-IDF + Optuna HPT | 0.8446 | 0.8463 | Optimized |
- F1-Score (Macro): Primary metric for multi-class classification
- Secondary Metrics: Accuracy, Precision, Recall per class
This project follows a Git Flow branching model with protected branches:
main βββββββββββββββββββββββββββββββββββ (releases)
\ / /
dev ββββββββββββββββββββββββββββββββββββ (integration)
\ / \ /
feature/* ββββββββββββββββββββββββββββββ (development)
| Branch | Description | Protection |
|---|---|---|
main |
Production-ready code | β Protected |
dev |
Development branch | β Protected |
All model versions are tagged in Git for traceability:
| Tag | Description | Model | F1 Score | Commit |
|---|---|---|---|---|
v1.0.0 |
Baseline model | RandomForest + BoW | 0.60 | abc123... |
v2.0.0 |
Improved model | LightGBM + TF-IDF + SMOTE | 0.84 | def456... |
v2.1.0 |
Optimized model | LightGBM + Optuna HPT | 0.8446 | ghi789... |
Creating and tagging versions:
# Tag v1.0.0 (Baseline)
git tag -a v1.0.0 -m "Baseline: RandomForest + BoW (F1=0.60)"
git push origin v1.0.0
# Tag v2.0.0 (Production)
git tag -a v2.0.0 -m "Production: LightGBM + TF-IDF (F1=0.84)"
git push origin v2.0.0
# View all tags
git tag -lBranch protection:
mainanddevbranches require pull requests- All commits must pass CI/CD pipeline
- Code review required before merge
The project includes three Dockerfiles for different purposes:
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY src/ ./src/
COPY configs/ ./configs/
CMD ["python", "src/training/trainer.py"]Usage:
docker build -f Dockerfile.train -t sentiment-train .
docker run -v ${PWD}/data:/app/data -v ${PWD}/models:/app/models sentiment-trainFROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY api/ ./api/
COPY src/ ./src/
COPY models/ ./models/
EXPOSE 8000
CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000"]Usage:
docker build -f Dockerfile.serve -t sentiment-api .
docker run -p 8000:8000 -v ${PWD}/models:/app/models sentiment-apiNginx-based frontend for user interface.
The docker-compose.yaml orchestrates the entire MLOps infrastructure:
Services included:
- MinIO: S3-compatible storage for DVC and MLflow artifacts
- MLflow: Experiment tracking server
- ZenML Server: Pipeline orchestration dashboard
- Sentiment API: FastAPI inference service
- Prometheus: Metrics collection
- Grafana: Monitoring dashboards
- Retrain Service: Automated retraining service
- Frontend: Web UI for predictions
Start all services:
docker-compose up -dView running services:
docker-compose psScreenshot showing all Docker containers running with docker-compose
Service URLs:
| Service | URL | Credentials |
|---|---|---|
| MinIO Console | http://localhost:9003 | minio / minio12345 |
| MLflow UI | http://localhost:5001 | - |
| ZenML Dashboard | http://localhost:8080 | - |
| API | http://localhost:8000 | - |
| Grafana | http://localhost:3000 | admin / admin |
| Prometheus | http://localhost:9090 | - |
DVC (Data Version Control) tracks large data files and model artifacts without storing them in Git. This project uses MinIO (S3-compatible) as remote storage.
Initialize DVC:
dvc initConfigure remote storage (MinIO):
dvc remote add -d storage s3://mlops-dvc
dvc remote modify storage endpointurl http://localhost:9002
dvc remote modify storage use_ssl false
dvc remote modify storage region us-east-1Set environment variables:
# Windows PowerShell
$env:AWS_ACCESS_KEY_ID="minio"
$env:AWS_SECRET_ACCESS_KEY="minio12345"
# Linux/macOS
export AWS_ACCESS_KEY_ID=minio
export AWS_SECRET_ACCESS_KEY=minio12345Add data to DVC:
# Download dataset
python scripts/download_data.py
# Track raw data
dvc add data/raw/reddit.csv
# Commit .dvc file to Git (not the actual data)
git add data/raw/reddit.csv.dvc data/raw/.gitignore
git commit -m "Add raw dataset with DVC"DVC Pipeline (dvc.yaml):
stages:
preprocess:
cmd: python src/preprocessing.py
deps:
- data/raw/reddit.csv
outs:
- data/processed/reddit_processed.csv
train:
cmd: python src/train.py
deps:
- data/processed/reddit_processed.csv
outs:
- models/model.pkl
- models/vectorizer.pklRun pipeline:
dvc repro # Reproduce entire pipeline
dvc dag # Visualize pipeline DAG
MinIO console showing bucket creation for DVC storage
DVC bucket with versioned data files
Actual data files stored in MinIO (not in Git)
Demonstration of dvc push command pushing data to MinIO remote storage
Command output:
Pushing to 'storage' (s3://mlops-dvc)
[####################] 100% data/raw/reddit.csv
Demonstration of dvc pull command retrieving data from remote storage
Command output:
Pulling from 'storage' (s3://mlops-dvc)
[####################] 100% data/raw/reddit.csv
DVC status showing tracked files and pipeline stages
Reproducibility proof:
# On a new machine
git clone <repo-url>
cd reddit-sentiment-mlops
dvc pull # Retrieves data from MinIO
dvc repro # Reproduces entire pipeline with exact same resultsMLflow tracks experiments, logs parameters, metrics, and artifacts. This project uses MLflow with MinIO for artifact storage.
Start MLflow server:
# Set environment variables
$env:AWS_ACCESS_KEY_ID="minio"
$env:AWS_SECRET_ACCESS_KEY="minio12345"
$env:MLFLOW_S3_ENDPOINT_URL="http://localhost:9002"
# Start MLflow server
mlflow server --host 0.0.0.0 --port 5001 \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root s3://mlflow-artifacts/Access MLflow UI: http://localhost:5001
Run experiments with MLflow:
# Baseline: Random Forest
python src/training/train_mlflow.py \
--exp-name sentiment_analysis \
--run-name "random_forest_baseline" \
--model-type random_forest
# LightGBM with TF-IDF
python src/training/train_mlflow.py \
--exp-name sentiment_analysis \
--run-name "lightgbm_tfidf" \
--model-type lightgbm \
--max-features 10000 \
--ngram-min 1 \
--ngram-max 3 \
--use-smote
# XGBoost
python src/training/train_mlflow.py \
--exp-name sentiment_analysis \
--run-name "xgboost_baseline" \
--model-type xgboost
MLflow UI showing all experiments with multiple runs
Experiments tracked:
sentiment_analysis: Main experimentoptuna_sentiment: Optuna optimization runszenml_pipeline: ZenML pipeline runs
Comparing multiple runs: Random Forest vs LightGBM vs XGBoost
Metrics comparison:
| Run Name | Model | F1 Score | Accuracy | Training Time |
|---|---|---|---|---|
| random_forest_baseline | RandomForest | 0.60 | 0.67 | 45s |
| lightgbm_tfidf | LightGBM | 0.84 | 0.84 | 38s |
| xgboost_baseline | XGBoost | 0.82 | 0.82 | 52s |
MLflow storing model artifacts, confusion matrices, and evaluation reports
Artifacts logged:
model.pkl: Trained modelvectorizer.pkl: Feature vectorizerconfusion_matrix.png: Classification visualizationevaluation_report.json: Detailed metricsclass_performance.png: Per-class metrics
ZenML pipeline runs automatically logged to MLflow
ZenML provides ML pipeline orchestration with automatic experiment tracking integration. Our pipeline follows: data β preprocess β train β evaluate β export.
Install and connect:
pip install zenml==0.91.2
# Connect to ZenML server (Docker)
zenml login http://localhost:8080 --no-verify-ssl
# Verify connection
zenml statusConfigure stack:
# Register MLflow experiment tracker
zenml experiment-tracker register mlflow_tracker \
--flavor=mlflow \
--tracking_uri=http://localhost:5001
# Create and activate stack
zenml stack register mlflow_stack \
-o default \
-a default \
-e mlflow_tracker \
--setPipeline structure:
@pipeline
def sentiment_analysis_pipeline():
# Step 1: Ingest data
raw_data = data_ingestion_step()
# Step 2: Preprocess
processed_data = preprocessing_step(raw_data)
# Step 3: Train model
model = training_step(processed_data)
# Step 4: Evaluate
metrics = evaluation_step(model, processed_data)
# Step 5: Export
export_step(model, metrics)Run pipeline:
python run_pipeline.py \
--model-type lightgbm \
--max-features 10000 \
--ngram-min 1 \
--ngram-max 3 \
--use-smote
ZenML dashboard showing pipeline runs and status
Visual representation of the pipeline DAG (data β preprocess β train β eval β export)
List of all pipeline runs with status, duration, and metadata
Analytics dashboard showing pipeline performance metrics
Successful pipeline execution with all steps completed
ZenML runs automatically logged to MLflow for unified tracking
Command-line execution of ZenML pipeline with parameters
 ZenML CLI showing stack configuration and pipeline management
Additional pipeline visualization views
Optuna performs Bayesian hyperparameter optimization using TPE (Tree-structured Parzen Estimator) sampler. This section demonstrates optimization results compared to baseline and grid search.
Run Optuna optimization:
# Set environment variables
$env:AWS_ACCESS_KEY_ID="minio"
$env:AWS_SECRET_ACCESS_KEY="minio12345"
$env:MLFLOW_S3_ENDPOINT_URL="http://localhost:9002"
$env:MLFLOW_TRACKING_URI="http://localhost:5001"
# Run Optuna with 10 trials
python src/training/optuna_sentiment.py \
--n-trials 10 \
--model-type lightgbm \
--max-features 10000 \
--ngram-min 1 \
--ngram-max 3 \
--exp-name optuna_sentimentLightGBM hyperparameters:
n_estimators: 100-400learning_rate: 0.01-0.2 (log scale)max_depth: 4-10num_leaves: 20-80min_child_samples: 10-50colsample_bytree: 0.6-1.0subsample: 0.6-1.0reg_alpha,reg_lambda: 0.001-0.5
| Method | Model | Trials | Best F1 | Time | Improvement |
|---|---|---|---|---|---|
| Baseline | LightGBM | 1 | 0.8306 | ~38s | - |
| Grid Search | LightGBM | 16 | 0.8350 | ~8min | +0.5% |
| Optuna TPE | LightGBM | 10 | 0.8426 | ~4.5min | +1.2% |
Improvement: Optuna achieved +1.2% F1 score with fewer trials than grid search.
Optuna dashboard showing best trial with F1 score of 0.8426
Best hyperparameters found:
{
'n_estimators': 250,
'learning_rate': 0.08,
'max_depth': 7,
'num_leaves': 65,
'min_child_samples': 25,
'colsample_bytree': 0.85,
'subsample': 0.9,
'reg_alpha': 0.05,
'reg_lambda': 0.1
} Optuna study showing optimization history and parameter importance
Key insights:
learning_rateandn_estimatorsare most importantmax_depthhas moderate impact- Optimization converged after ~8 trials
The project includes a complete CI/CD pipeline with automated testing, building, deployment, and Continuous Training (CT).
βββββββββββ βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β TEST β β β BUILD β β β PUSH β β β DEPLOY β β β SMOKE β
βlint+testβ β images β βregistry β β(manual) β β CT β
βββββββββββ βββββββββββ βββββββββββ βββββββββββ βββββββββββ
| Stage | Jobs | Description |
|---|---|---|
| test | lint, test, security_scan |
Run flake8, pytest, safety, bandit |
| build | build_api, build_retrain, build_frontend |
Build Docker images |
| push | push_api, push_retrain, push_frontend |
Push to GitLab Container Registry |
| deploy | deploy_staging, deploy_production |
Manual deployment (optional) |
| smoke | smoke_test_training, ct_full_training |
Smoke test & CT (scheduled) |
 GitLab CI pipeline showing linting stage with flake8, black, and isort checks
 Unit tests running with pytest and coverage reporting
Scheduled CT jobs running minimal training to verify pipeline integrity
Smoke test configuration:
- Uses 1000 rows of data
- Trains with
n_estimators=10 - Validates minimum F1 threshold (0.3)
- Runs daily or on manual trigger
GitLab CI/CD pipeline integrated with repository
Key features:
- β Automated testing on every commit
- β Docker image building and pushing
- β Scheduled CT jobs (daily smoke test, weekly full training)
- β Manual deployment gates
- β Artifact storage for test reports
The project includes a production-ready FastAPI inference service with model versioning, health checks, and metrics endpoints.
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
API information |
| GET | /health |
Health check |
| GET | /model/info |
Model version & metrics |
| POST | /predict |
Single text prediction |
| POST | /predict/batch |
Batch predictions |
| GET | /docs |
Swagger documentation |
| GET | /metrics |
Prometheus metrics |
π¬ Live demonstration of model versioning, inference testing, and rollback capabilities.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DEPLOYMENT DEMO WORKFLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β β START β‘ DEPLOY V2 β’ ROLLBACK V1 β
β βββββββββββ βββββββββββ βββββββββββ β
β β V1.0 β βββββββββββΊ β V2.1 β βββββββββββΊ β V1.0 β β
β β F1=0.60 β upgrade β F1=0.84 β rollback β F1=0.60 β β
β βββββββββββ βββββββββββ βββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββ βββββββββββ βββββββββββ β
β β Test β β Test β β Test β β
β β 49.8% β β 92.1% β β 49.8% β β
β βββββββββββ βββββββββββ βββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
 Deploying version 1.0.0 (RandomForest + BoW) with F1=0.60
Deployment command:
.\scripts\deploy.ps1 -Version "v1.0.0"Expected output:
=============================================
Sentiment API Deployment Script
=============================================
Model folder: models/v1
Deploying version: v1.0.0
β
DEPLOYMENT SUCCESSFUL!
Model Info:
Version: v1
Model: RandomForestClassifier
Status: healthy
Test Prediction:
Text: 'This is great!'
Label: Positive
Confidence: 49.82% β Low confidence (baseline model)
=============================================
 Upgrading to version 2.1.0 (LightGBM + TF-IDF + Optuna) with F1=0.84
Deployment command:
.\scripts\deploy.ps1 -Version "v2.1.0"Expected output:
=============================================
Sentiment API Deployment Script
=============================================
Current version: v1
Model folder: models/v2
Deploying version: v2.1.0
β
DEPLOYMENT SUCCESSFUL!
Model Info:
Version: v2
Model: lightgbm
Status: healthy
Test Prediction:
Text: 'This is great!'
Label: Positive
Confidence: 92.12% β High confidence (optimized model)
=============================================
 API health endpoint showing service status and model information
Health check response:
{
"status": "healthy",
"model_version": "v2",
"model_type": "lightgbm",
"f1_score": 0.8446,
"accuracy": 0.8463
} FastAPI Swagger documentation showing all available endpoints
V1 Prediction (Baseline):
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{"text": "I love this product, it'\''s amazing!"}'Response:
{
"text": "I love this product, it's amazing!",
"prediction": 1,
"label": "Positive",
"confidence": 0.52, β ~52% confidence
"probabilities": {
"Negative": 0.18,
"Neutral": 0.30,
"Positive": 0.52
}
}V2 Prediction (Optimized):
{
"text": "I love this product, it's amazing!",
"prediction": 1,
"label": "Positive",
"confidence": 0.94, β ~94% confidence (much better!)
"probabilities": {
"Negative": 0.02,
"Neutral": 0.04,
"Positive": 0.94
}
}| Test Text | V1 Confidence | V2 Confidence | Improvement |
|---|---|---|---|
| "This is great!" | 49.8% | 92.1% | +85% |
| "I love this product!" | 52.0% | 94.0% | +81% |
| "Terrible experience" | 45.2% | 89.3% | +98% |
| "It's okay I guess" | 38.1% | 67.4% | +77% |
Rollback to V1:
.\scripts\deploy.ps1 -Version "v1.0.0" -RollbackExpected output:
π΄ ROLLBACK to version: v1.0.0
Stopping current API...
Starting API with model from v1 (version v1.0.0)...
β
DEPLOYMENT SUCCESSFUL!
Model Info:
Version: v1
Model: RandomForestClassifier
Confidence: 49.82% β Back to baseline
The project includes Prometheus + Grafana for comprehensive monitoring:
- Prometheus: Metrics collection
- Grafana: Visualization dashboards
- Custom metrics: Request latency, prediction confidence, error rates
Grafana dashboard showing API metrics: request rate, latency, and prediction confidence
Metrics tracked:
- Request rate (requests/second)
- Prediction latency (p50, p95, p99)
- Prediction confidence distribution
- Error rate
- Model version in use
Retrain service configuration showing scheduled retraining triggers
Retrain triggers:
- Scheduled: Daily at 2 AM
- Conditional: When accuracy drops below threshold
- Manual: Via API endpoint
Grafana showing retraining execution time and frequency
Automatic retraining conditions:
- Scheduled: Daily/weekly retraining
- Accuracy threshold: Retrain if F1 < 0.80
- Feedback threshold: Retrain after 1000 new predictions
- Manual trigger: Via API endpoint
Retrain service logs:
[2025-01-03 02:00:00] Starting scheduled retraining...
[2025-01-03 02:00:05] Loading new data...
[2025-01-03 02:00:10] Training new model...
[2025-01-03 02:02:30] New model F1: 0.8450 (improvement: +0.0004)
[2025-01-03 02:02:31] Deploying new model...
[2025-01-03 02:02:35] Retraining completed successfully
- Python 3.10+
- Docker & Docker Compose
- Git
- (Optional) GitLab account for CI/CD
git clone <your-repo-url>
cd reddit-sentiment-mlopspython -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/Macpip install -r requirements.txtdocker-compose up -dVerify services:
docker-compose ps# Initialize DVC
dvc init
# Configure remote (MinIO)
dvc remote add -d storage s3://mlops-dvc
dvc remote modify storage endpointurl http://localhost:9002
dvc remote modify storage use_ssl false
# Set environment variables
$env:AWS_ACCESS_KEY_ID="minio"
$env:AWS_SECRET_ACCESS_KEY="minio12345"
# Download and track data
python scripts/download_data.py
dvc add data/raw/reddit.csv
git add data/raw/reddit.csv.dvc
git commit -m "Add data with DVC"
dvc push# Set environment variables
$env:AWS_ACCESS_KEY_ID="minio"
$env:AWS_SECRET_ACCESS_KEY="minio12345"
$env:MLFLOW_S3_ENDPOINT_URL="http://localhost:9002"
$env:MLFLOW_TRACKING_URI="http://localhost:5001"# Install ZenML
pip install zenml==0.91.2
# Connect to ZenML server
zenml login http://localhost:8080 --no-verify-ssl
# Register MLflow tracker
zenml experiment-tracker register mlflow_tracker \
--flavor=mlflow \
--tracking_uri=http://localhost:5001
# Create stack
zenml stack register mlflow_stack \
-o default \
-a default \
-e mlflow_tracker \
--setpython src/training/train_mlflow.py \
--exp-name sentiment_analysis \
--run-name "baseline" \
--model-type random_forestpython src/training/optuna_sentiment.py \
--n-trials 10 \
--model-type lightgbmpython run_pipeline.py \
--model-type lightgbm \
--max-features 10000 \
--ngram-min 1 \
--ngram-max 3 \
--use-smote# Deploy V1
.\scripts\deploy.ps1 -Version "v1.0.0"
# Deploy V2
.\scripts\deploy.ps1 -Version "v2.1.0"curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{"text": "This is amazing!"}'reddit-sentiment-mlops/
β
βββ π api/ # FastAPI application
β βββ __init__.py
β βββ main.py # API endpoints
β
βββ π configs/ # Configuration files
β βββ config.yaml # Main configuration
β βββ optuna_config.yaml # Hyperparameter tuning config
β
βββ π data/ # Data directory (DVC tracked)
β βββ raw/ # Raw data
β β βββ reddit.csv.dvc # DVC tracking file
β βββ processed/ # Processed data
β
βββ π models/ # Trained models
β βββ v1/ # Version 1 models
β β βββ model.pkl
β β βββ vectorizer.pkl
β βββ v2/ # Version 2 models
β βββ model.pkl
β βββ vectorizer.pkl
β
βββ π notebooks/ # Jupyter notebooks (EDA & experiments)
β βββ Preprocessing_and_EDA.ipynb
β βββ experiment_1_baseline_model.ipynb
β βββ ...
β
βββ π scripts/ # Utility scripts
β βββ download_data.py
β βββ deploy.ps1 # Deployment script
β βββ run_optuna.ps1 # Optuna runner
β βββ ...
β
βββ π src/ # Source code (modular structure)
β βββ __init__.py
β βββ π data/ # Data loading module
β β βββ loader.py
β βββ π features/ # Feature engineering
β β βββ preprocessing.py
β β βββ extraction.py
β βββ π training/ # Training module
β β βββ trainer.py
β β βββ train_mlflow.py
β β βββ optuna_sentiment.py
β βββ π evaluation/ # Evaluation module
β β βββ evaluator.py
β βββ π serving/ # Serving module
β β βββ predictor.py
β β βββ retrain_service.py
β βββ π pipelines/ # ZenML pipelines
β β βββ π steps/
β β β βββ data_ingestion.py
β β β βββ preprocessing.py
β β β βββ training.py
β β β βββ evaluation.py
β β β βββ export.py
β β βββ π zenml/
β β βββ sentiment_pipeline.py
β βββ π utils/ # Utilities
β βββ config.py
β βββ logging_utils.py
β
βββ π monitoring/ # Prometheus/Grafana configs
β βββ prometheus/
β β βββ prometheus.yml
β βββ grafana/
β βββ provisioning/
β
βββ π tests/ # Unit tests
β βββ test_data.py
β βββ test_features.py
β βββ test_training.py
β βββ test_evaluation.py
β
βββ π img/ # Documentation images
β βββ demo.gif # Complete demo
β βββ Docker/
β βββ dvc/
β βββ mlflow/
β βββ Zenml/
β βββ optuna/
β βββ ci-cd/
β βββ Serving & Rollback/
β βββ Grafana/
β
βββ π .env # Environment variables
βββ π .gitignore
βββ π .gitlab-ci.yml # GitLab CI/CD pipeline
βββ π docker-compose.yaml # Full stack orchestration
βββ π Dockerfile.train # Training container
βββ π Dockerfile.serve # API serving container
βββ π dvc.yaml # DVC pipeline definition
βββ π dvc.lock # DVC lock file
βββ π params.yaml # DVC parameters
βββ π requirements.txt # Python dependencies
βββ π Makefile # Automation commands
βββ π run_pipeline.py # ZenML pipeline runner
βββ π README.md # This file
dvc init # Initialize DVC
dvc add <file> # Track file with DVC
dvc push # Push data to remote
dvc pull # Pull data from remote
dvc status # Check local status
dvc status --cloud # Check cloud sync status
dvc repro # Reproduce pipeline
dvc dag # Show pipeline DAGdocker-compose up -d # Start all services
docker-compose down # Stop all services
docker-compose ps # List running services
docker-compose logs # View logs
docker build -f Dockerfile.serve -t sentiment-api .mlflow server --host 0.0.0.0 --port 5001 \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root s3://mlflow-artifacts/zenml login http://localhost:8080 --no-verify-ssl
zenml status
zenml stack list
zenml stack describe
zenml pipeline runs listpython src/training/optuna_sentiment.py \
--n-trials 10 \
--model-type lightgbm \
--max-features 10000make help # Show all commands
make lint # Run linting
make test # Run tests
make test-cov # Run tests with coverage
make build # Build Docker images
make smoke # Run smoke test
make docker-up # Start Docker services
make docker-down # Stop Docker services-
Reproducibility
- β Git versioning for code
- β DVC for data versioning
- β MLflow for experiment tracking
- β Docker for environment consistency
-
Traceability
- β Git tags for model versions
- β MLflow run tracking
- β ZenML pipeline execution logs
- β DVC pipeline provenance
-
Automation
- β CI/CD pipeline for testing and deployment
- β Scheduled retraining (CT)
- β Automated model deployment
-
Monitoring
- β Prometheus metrics collection
- β Grafana dashboards
- β API health checks
- β Model performance tracking
-
Scalability
- β Containerized services
- β Microservices architecture
- β Stateless API design
- DVC Documentation
- MLflow Documentation
- ZenML Documentation
- Optuna Documentation
- FastAPI Documentation
- Docker Documentation
- Reddit Sentiment Analysis Dataset by Himanshu-1703
- MLOps tools: DVC, MLflow, ZenML, Optuna, Docker, FastAPI
- Open-source community
π This project demonstrates a complete, production-ready MLOps workflow from data versioning to model deployment and monitoring!

