-
Notifications
You must be signed in to change notification settings - Fork 35
Home
Welcome to the MLOps Workshop! This wiki provides step-by-step walkthroughs for each module.
This 6-hour hands-on workshop teaches you to build production-ready ML systems from scratch. You'll progress through 8 modules covering the complete MLOps lifecycle: from model training to production deployment with monitoring and CI/CD.
Module 0: Setup
↓
Module 1: Model Training & Experiment Tracking
↓
Module 2: Model Packaging & Serving
↓
Module 3: Kubernetes Deployment
↓
Module 4: API Gateway & Polyglot Architecture
↓
Module 5: ML Pipeline Automation
↓
Module 6: Monitoring & Observability
↓
Module 7: CI/CD Pipeline
↓
🎉 Complete MLOps Platform!
Set up your development environment with Python, Go, Docker, Kubernetes, and all workshop dependencies.
What you'll install:
- Python 3.9+ with ML libraries (MLflow, BentoML, Transformers)
- Go 1.21+ for infrastructure services
- Docker for containerization
- kubectl and kind for local Kubernetes
- MLflow tracking server and BentoML
Train a sentiment analysis model with Hugging Face transformers and track experiments using MLflow.
What you'll learn:
- ✅ Fine-tune DistilBERT for sentiment classification
- ✅ Track experiments with MLflow (parameters, metrics, models)
- ✅ Use MLflow Model Registry for version management
- ✅ Compare training runs and select best models
- ✅ Build production-ready training scripts
Exercises:
- Exercise 1: Basic Training with MLflow
- Exercise 2: Model Registry Workflow
→ Start Module 1: MLflow & Experiment Tracking
Package your trained model as a production-ready REST API using BentoML 1.4+.
What you'll learn:
- ✅ BentoML 1.4+ class-based service architecture
- ✅ Pydantic v2 validation for type-safe APIs
- ✅ Error handling and structured logging
- ✅ Batch processing for higher throughput
- ✅ Docker containerization
- ✅ OpenAPI/Swagger documentation
Exercises:
- Exercise 1: Basic BentoML Service
- Exercise 2: Production Features
→ Start Module 2: BentoML & Model Serving
Deploy your containerized ML service to Kubernetes with production-grade configuration.
What you'll learn:
- ✅ Kubernetes fundamentals (Pods, Deployments, Services)
- ✅ Resource management (requests, limits, QoS)
- ✅ Health probes (startup, liveness, readiness)
- ✅ Horizontal Pod Autoscaling (HPA)
- ✅ ConfigMaps for configuration management
- ✅ High availability and security patterns
Exercises:
- Exercise 1: Basic Deployment
- Exercise 2: Production Configuration
- Exercise 3: Auto-scaling & HA
→ Start Module 3: Kubernetes Deployment
Build a high-performance API gateway in Go to front your ML services.
What you'll learn:
- ✅ Why Go for infrastructure (67% resource reduction)
- ✅ Reverse proxy patterns
- ✅ Middleware (logging, CORS, rate limiting)
- ✅ Health checks and circuit breakers
- ✅ Prometheus metrics integration
- ✅ Polyglot architecture benefits
Exercises:
- Exercise 1: Basic Reverse Proxy
- Exercise 2: Production Middleware
→ Start Module 4: Go API Gateway
Orchestrate end-to-end ML workflows with Kubeflow Pipelines.
What you'll learn:
- ✅ Kubeflow Pipelines components and DAGs
- ✅ Artifact tracking and versioning
- ✅ Pipeline orchestration patterns
- ✅ KServe for model serving
- ✅ Multi-model deployment strategies
- ✅ Automated retraining workflows
Exercises:
- Exercise 1: Data Preparation Component
- Exercise 2: Training & Evaluation Components
- Exercise 3: Pipeline Orchestration
→ Start Module 5: Kubeflow Pipelines
Set up production monitoring with Prometheus and Grafana.
What you'll learn:
- ✅ Prometheus for metrics collection
- ✅ PromQL queries and aggregation
- ✅ Alerting rules and Alertmanager
- ✅ Grafana dashboards for visualization
- ✅ ML-specific metrics (prediction latency, model performance)
- ✅ SLO/SLA monitoring
Exercises:
- Exercise 2: Alerting Rules
- Exercise 3: Grafana Dashboard
→ Start Module 6: Prometheus & Grafana
Automate your ML deployment pipeline with GitHub Actions.
What you'll learn:
- ✅ GitHub Actions workflow syntax
- ✅ Multi-stage CI/CD (build, test, deploy)
- ✅ Security scanning (Trivy, Snyk)
- ✅ Multi-environment deployment (dev → staging → prod)
- ✅ Approval gates and notifications
- ✅ Rollback strategies
- ✅ GitOps principles
Workflows:
- Step 1: Basic Build
- Step 2: Build & Test
- Step 3: Build, Test & Deploy
- Step 4: Production-Ready Pipeline
→ Start Module 7: GitHub Actions CI/CD
This workshop uses a hands-on scaffolded approach:
✅ What you get:
- Complete file structure and imports
- 80-90% of code already written
- TODOs with inline hints
✅ What you implement:
- Specific function calls (1-3 lines per TODO)
- Key parameter values
- Critical configuration
- ~10-20% of each exercise
By the end of this workshop, you will be able to:
- ✅ Train ML models with experiment tracking (MLflow)
- ✅ Package models as production-ready APIs (BentoML)
- ✅ Deploy services to Kubernetes with auto-scaling
- ✅ Build high-performance infrastructure (Go)
- ✅ Orchestrate ML workflows (Kubeflow)
- ✅ Monitor model performance in production (Prometheus/Grafana)
- ✅ Automate deployments with CI/CD (GitHub Actions)
- Setup Guide - Detailed environment setup instructions
- MLflow Documentation
- BentoML Documentation
- Kubernetes Documentation
- Kubeflow Documentation
- Prometheus Documentation
- GitHub Actions Documentation
If you encounter issues:
- Check the module's Troubleshooting section - Each module has common issues and fixes
- Review the Troubleshooting Guide - Comprehensive troubleshooting resource
-
Check solution files - Located in
modules/module-X/solution/
Ready to begin? Follow these steps:
Start with Module 0 to install all required tools:
Each module builds on the previous. Do not skip modules.
Module 0 → Module 1 → Module 2 → ... → Module 7
Each module has hands-on exercises with TODOs. Fill in the blanks.
By the end of this workshop, you'll have a complete, production-ready MLOps platform:
Components:
- Model Training: MLflow tracking + model registry
- Model Serving: BentoML API + Docker containers
- Orchestration: Kubernetes with auto-scaling
- API Gateway: Go reverse proxy + middleware
- ML Pipelines: Kubeflow for workflow automation
- Monitoring: Prometheus metrics + Grafana dashboards
- CI/CD: GitHub Actions for automated deployments