-
Notifications
You must be signed in to change notification settings - Fork 35
Module 1.md
By the end of this module, you'll have:
- ✅ A production-ready sentiment analysis model trained on IMDB dataset
- ✅ Complete experiment tracking pipeline with MLflow
- ✅ Registered model versions in MLflow Model Registry (optional advanced section)
- ✅ Model lifecycle management with aliases and automated promotion (optional)
- ✅ Full test suite validating your implementation
Learn ML experiment tracking by training a sentiment analysis model with progressively more production features. This module uses a scaffolded approach where you fill in specific code blanks rather than writing everything from scratch.
By the end of this module, you will:
- ✅ Train NLP models using Hugging Face transformers
- ✅ Track experiments with MLflow
- ✅ Log parameters, metrics, and models
- ✅ Register models in MLflow model registry
- ✅ Manage model lifecycle with aliases and automated promotion
- Python 3.9+ installed
- Virtual environment activated
- Basic understanding of ML concepts
cd modules/module-1pip install -r requirements.txtThis installs:
-
transformers- Hugging Face transformers library -
datasets- Hugging Face datasets library -
mlflow- Experiment tracking -
scikit-learn- Metrics computation
mlflow uiOpen browser to http://localhost:5000 to view experiments in real-time.
Exercise 1: Basic Training
↓
Exercise 2: MLflow Tracking & Registry
├─ Part 1: Basic Tracking
└─ Part 2: Model Registry
Each exercise builds on the previous one, adding more capabilities.
Train a sentiment analysis model using Hugging Face transformers.
- Load pre-trained DistilBERT model and tokenizer
- Load IMDB sentiment dataset
- Tokenize text data
- Configure training
- Train and evaluate model
- Save trained model
-
Open the starter file:
-
Find and fill in 10 TODOs:
- Look for comments like
# YOUR CODE HERE - Each TODO has hints showing exactly what function to call
- Most are 1-3 lines of code
- Look for comments like
-
Run your implementation:
python train_basic.py
TODO 1-2: Load model and tokenizer
# Hint: Use AutoModelForSequenceClassification.from_pretrained()
# Hint: Use AutoTokenizer.from_pretrained()TODO 3: Load IMDB dataset
# Hint: Use load_dataset("imdb")TODO 4: Tokenize text
# Hint: Call tokenizer() with padding="max_length", truncation=TrueTODO 5-7: Set up training
# Hint: Create TrainingArguments, Trainer, call trainer.train()TODO 8: Evaluate model
# Hint: Call trainer.evaluate()TODO 9-10: Save model
# Hint: trainer.save_model(), tokenizer.save_pretrained()- Check the hints in TODO comments
- Review
solution/train_basic.pyfor reference
Learn experiment tracking and model lifecycle management with MLflow. This exercise has two parts: basic tracking and advanced registry workflow (optional).
Part 1 (Required):
- Import MLflow and transformers integration
- Set up MLflow experiments
- Log training hyperparameters
- Log evaluation metrics
- Log trained models as artifacts
Part 2 (Advanced/Optional):
- Register multiple model versions
- Transition models through stages using MLflow 2.9+ aliases
- Load models by stage/alias
- Implement automated model promotion logic
-
Open the starter file:
-
Find and fill in 8 TODOs (Part 1):
- TODO 1: Import MLflow
- TODO 2-4: Log parameters
- TODO 5: Log training loss
- TODO 6: Log evaluation metrics
- TODO 7: Log model
- TODO 8: Set experiment name
-
Run your implementation:
python train_with_mlflow.py
-
View in MLflow UI:
TODO 1: Import MLflow
# FILL IN: Import mlflow and mlflow.transformersTODO 2-4: Log parameters
# FILL IN: Use mlflow.log_param() to log model_name, epochs, batch_size, etc.TODO 5: Log training loss
# FILL IN: Use mlflow.log_metric("train_loss", value)TODO 6: Log evaluation metrics
# FILL IN: Use mlflow.log_metric() for eval_loss, accuracy, precision, f1TODO 7: Log model
# FILL IN: Use mlflow.transformers.log_model()TODO 8: Set experiment name
# FILL IN: Use mlflow.set_experiment()- All hyperparameters are tracked automatically
- Metrics are stored for comparison
- Models are versioned as artifacts
- You can compare multiple runs in the UI
- ✅ You've completed Part 1 successfully
- ✅ You have extra time
- ✅ You want to learn model lifecycle management
- ✅ You need model governance workflows
-
The file is already open (
train_with_mlflow.py) -
Find and fill in 4 Advanced TODOs (Part 2):
- Advanced TODO 1: Train and register models
- Advanced TODO 2: Transition model to stage using aliases
- Advanced TODO 3: Load model by alias
- Advanced TODO 4: Implement automated promotion logic
-
Run the advanced workflow:
python train_with_mlflow.py --advanced
-
View in MLflow UI:
Model Lifecycle with Aliases (MLflow 2.9+):
Register → Set Alias → Load by Alias → Promote
- champion: Production model serving live traffic
- challenger: Model being A/B tested
- staging: Model undergoing validation
- archived: Old version, no longer used
Loading Models by Alias:
# Load champion model
model_uri = "models:/sentiment-classifier@champion"
model = mlflow.transformers.load_model(model_uri)
# Deployment code doesn't need to know version number!
# When you promote a new model, it's automatically usedAutomated Promotion:
# Compare staging vs champion
if staging_accuracy > champion_accuracy:
# Set new champion
client.set_registered_model_alias(
name=model_name,
alias="champion",
version=staging_version
)In Module 2, you'll load models from the Registry in BentoML:
import mlflow
import bentoml
@bentoml.service
class SentimentService:
def __init__(self):
# Load latest champion model from Registry
model_uri = "models:/sentiment-classifier@champion"
self.model = mlflow.transformers.load_model(model_uri)
@bentoml.api
def predict(self, text: str) -> dict:
result = self.model(text)
return {"sentiment": result[0]["label"]}Now when you promote a new model to champion in MLflow, your service automatically uses it on restart!
Complete reference implementations are available in the solution/ folder:
-
solution/train_basic.py- Exercise 1 solution -
solution/train_with_mlflow.py- Exercise 2 solution (both Part 1 and Part 2)
Use these if you get stuck or want to compare approaches!
Symptoms: Import errors when running training scripts
Solution:
# Install all required dependencies
pip install -r requirements.txt
# Verify installation
python -c "import transformers; print(transformers.__version__)"Prevention: Always activate your virtual environment before running scripts.
Symptoms: Each epoch takes 3-5+ minutes on CPU
Root Cause: Transformer models are computationally expensive. CPU training is significantly slower than GPU.
Solutions:
# Option 1: Reduce training samples
python train_production.py --train_samples 500 --test_samples 100
# Option 2: Reduce epochs
python train_production.py --epochs 1
# Option 3: Use smaller model
python train_production.py --model_name distilbert-base-uncased # Already defaultSymptoms: Browser shows "No experiments" at http://localhost:5000
Solutions:
-
Run training first: MLflow UI only shows data after runs are created
python starter/train_with_mlflow.py
-
Check MLflow directory:
ls mlruns/
-
Verify experiment name:
# Check if experiment exists mlflow experiments search -
Restart MLflow UI:
# Kill existing UI pkill -f "mlflow ui" # Restart mlflow ui
Symptoms:
ConnectionError: Couldn't reach the Hugging Face Hub
Solutions:
Option 1: Retry with timeout increase
from datasets import load_dataset
dataset = load_dataset("imdb", timeout=120) # Increase timeoutOption 2: Download once and cache
# Pre-download dataset
python -c "from datasets import load_dataset; load_dataset('imdb')"
# Check cache location
ls ~/.cache/huggingface/datasets/Option 3: Use manual download
# If network issues persist, download manually:
# https://huggingface.co/datasets/imdbStill stuck? Check the solutions folder
# Navigate to module
cd modules/module-1
# Install dependencies
pip install -r requirements.txt
# Start MLflow UI (optional but recommended)
mlflow ui --host 0.0.0.0 --port 5000
# Run basic training
python starter/train_basic.py
# Run with MLflow tracking (Part 1 - basic tracking)
python starter/train_with_mlflow.py
# Run advanced registry workflow (Part 2 - optional)
python starter/train_with_mlflow.py --advanced# Exercise 1: Basic training
python starter/train_basic.py
# Exercise 2 Part 1: MLflow tracking (default mode)
python starter/train_with_mlflow.py
# Exercise 2 Part 2: Advanced registry workflow
python starter/train_with_mlflow.py --advanced
# Show available options
python starter/train_with_mlflow.py --help# Start MLflow UI
mlflow ui
# Start on specific host/port
mlflow ui --host 0.0.0.0 --port 5001
# List all experiments
mlflow experiments search
# Search runs in experiment
mlflow runs list --experiment-id 1
# Create new experiment
mlflow experiments create --experiment-name my-experiment
# Delete experiment
mlflow experiments delete --experiment-id 2
# View run details
mlflow runs describe --run-id <run-id>
# Create virtual environment
python -m venv venv
# Activate virtual environment
source venv/bin/activate # macOS
# Install requirements
pip install -r requirements.txt
# List installed packages
pip list
# Deactivate virtual environment
deactivateBuilding on Module 1's trained model, Module 2 adds:
- ✅ REST API endpoints for predictions
- ✅ Request validation and error handling
- ✅ Docker containerization
- ✅ Kubernetes deployment
- ✅ Load testing and performance monitoring
- ✅ HuggingFace Transformers: Load and fine-tune pre-trained models
- ✅ MLflow Tracking: Track experiments, parameters, and metrics
- ✅ Model Registry: Version and manage trained models
- ✅ Production Patterns: CLI arguments, error handling, logging
- Module 2: Package models with BentoML for serving
- Module 3: Deploy to Kubernetes clusters
- Module 4: Build Go API gateways
| Previous | Home | Next |
|---|---|---|
| ← Module 0: Environment Setup | 🏠 Home | Module 2: Model Packaging & Serving → |
MLOps Workshop | GitHub Repository