# MLOps Workshop Wiki Welcome to the **MLOps Workshop**! This wiki provides step-by-step walkthroughs for each module. ## Workshop Overview This **6-hour hands-on workshop** teaches you to build production-ready ML systems from scratch. You'll progress through 8 modules covering the complete MLOps lifecycle: from model training to production deployment with monitoring and CI/CD. ### Platform Support | Platform | How to run | |---|---| | macOS (Intel & Apple Silicon) | Local — **[Module 0 → Option A](Module-0#option-a-macos-local-setup)** | | Linux (Ubuntu 20.04+) | Local — follow Option A, `apt` instead of `brew` | | Windows 10/11 (native) | Local — **[Module 0 → Option D](Module-0#option-d-windows-native-powershell)** | | Windows 10/11 (WSL 2) | Local — **[Module 0 → Option C](Module-0#option-c-windows-wsl-2)** | | Any (browser, zero install) | **[Module 0 → Option B: Codespaces](Module-0#option-b-github-codespaces)** | --- ## Workshop Structure ### Complete Learning Path ``` Module 0: Setup ↓ Module 1: Model Training & Experiment Tracking ↓ Module 2: Model Packaging & Serving ↓ Module 3: Kubernetes Deployment ↓ Module 4: API Gateway & Polyglot Architecture ↓ Module 5: ML Pipeline Automation ↓ Module 6: Monitoring & Observability ↓ Module 7: CI/CD Pipeline ↓ 🎉 Complete MLOps Platform! ``` --- ## Modules {#modules} ### Module 0: Environment Setup Set up your development environment with Python, Go, Docker, Kubernetes, and all workshop dependencies. **What you'll install:** - Python 3.9+ with ML libraries (MLflow, BentoML, Transformers) - Go 1.21+ for infrastructure services - Docker for containerization - kubectl and kind for local Kubernetes - MLflow tracking server and BentoML → **[Start Module 0: Setup Guide](Module-0)** --- ### Module 1: Model Training & Experiment Tracking Train a sentiment analysis model with Hugging Face transformers and track experiments using MLflow. **What you'll learn:** - ✅ Fine-tune DistilBERT for sentiment classification - ✅ Track experiments with MLflow (parameters, metrics, models) - ✅ Use MLflow Model Registry for version management - ✅ Compare training runs and select best models - ✅ Build production-ready training scripts **Exercises:** 1. **Exercise 1:** Basic Training with MLflow 2. **Exercise 2:** Model Registry Workflow → **[Start Module 1: MLflow & Experiment Tracking](Module-1)** --- ### Module 2: Model Packaging & Serving Package your trained model as a production-ready REST API using BentoML 1.4+. **What you'll learn:** - ✅ BentoML 1.4+ class-based service architecture - ✅ Pydantic v2 validation for type-safe APIs - ✅ Error handling and structured logging - ✅ Batch processing for higher throughput - ✅ Docker containerization - ✅ OpenAPI/Swagger documentation **Exercises:** 1. **Exercise 1:** Basic BentoML Service 2. **Exercise 2:** Production Features → **[Start Module 2: BentoML & Model Serving](Module-2)** --- ### Module 3: Kubernetes Deployment Deploy your containerized ML service to Kubernetes with production-grade configuration. **What you'll learn:** - ✅ Kubernetes fundamentals (Pods, Deployments, Services) - ✅ Resource management (requests, limits, QoS) - ✅ Health probes (startup, liveness, readiness) - ✅ Horizontal Pod Autoscaling (HPA) - ✅ ConfigMaps for configuration management - ✅ High availability and security patterns **Exercises:** 1. **Exercise 1:** Basic Deployment 2. **Exercise 2:** Production Configuration 3. **Exercise 3:** Auto-scaling & HA → **[Start Module 3: Kubernetes Deployment](Module-3)** --- ### Module 4: API Gateway & Polyglot Architecture Build a high-performance API gateway in Go to front your ML services. **What you'll learn:** - ✅ Why Go for infrastructure (67% resource reduction) - ✅ Reverse proxy patterns - ✅ Middleware (logging, CORS, rate limiting) - ✅ Health checks and circuit breakers - ✅ Prometheus metrics integration - ✅ Polyglot architecture benefits **Exercises:** 1. **Exercise 1:** Basic Reverse Proxy 2. **Exercise 2:** Production Middleware → **[Start Module 4: Go API Gateway](Module-4)** --- ### Module 5: ML Pipeline Automation Orchestrate end-to-end ML workflows with Kubeflow Pipelines. **What you'll learn:** - ✅ Kubeflow Pipelines components and DAGs - ✅ Artifact tracking and versioning - ✅ Pipeline orchestration patterns - ✅ KServe for model serving - ✅ Multi-model deployment strategies - ✅ Automated retraining workflows **Exercises:** 1. **Exercise 1:** Data Preparation Component 2. **Exercise 2:** Training & Evaluation Components 3. **Exercise 3:** Pipeline Orchestration → **[Start Module 5: Kubeflow Pipelines](Module-5)** --- ### Module 6: Monitoring & Observability Set up production monitoring with Prometheus and Grafana. **What you'll learn:** - ✅ Prometheus for metrics collection - ✅ PromQL queries and aggregation - ✅ Alerting rules and Alertmanager - ✅ Grafana dashboards for visualization - ✅ ML-specific metrics (prediction latency, model performance) - ✅ SLO/SLA monitoring **Exercises:** 1. **Exercise 2:** Alerting Rules 2. **Exercise 3:** Grafana Dashboard → **[Start Module 6: Prometheus & Grafana](Module-6)** --- ### Module 7: CI/CD Pipeline Automate your ML deployment pipeline with GitHub Actions. **What you'll learn:** - ✅ GitHub Actions workflow syntax - ✅ Multi-stage CI/CD (build, test, deploy) - ✅ Security scanning (Trivy, Snyk) - ✅ Multi-environment deployment (dev → staging → prod) - ✅ Approval gates and notifications - ✅ Rollback strategies - ✅ GitOps principles **Workflows:** 1. **Step 1:** Basic Build 2. **Step 2:** Build & Test 3. **Step 3:** Build, Test & Deploy 4. **Step 4:** Production-Ready Pipeline → **[Start Module 7: GitHub Actions CI/CD](Module-7)** --- ## Installing on Windows (Native PowerShell) Run the workshop natively on Windows 10/11 using PowerShell — no WSL required. ### Prerequisites | Tool | Install | |---|---| | Python 3.11+ | [python.org](https://www.python.org/downloads/) — check **"Add to PATH"** during install | | Docker Desktop | [docker.com](https://www.docker.com/products/docker-desktop/) | | kind | `winget install Kubernetes.kind` or download from [kind.sigs.k8s.io](https://kind.sigs.k8s.io/) | | kubectl | `winget install Kubernetes.kubectl` | | Go 1.21+ | `winget install GoLang.Go` | | Git | [git-scm.com](https://git-scm.com/download/win) | ### Setup **1. Clone the repo** ```powershell git clone cd ml-con-workshop ``` **2. Create a Python virtual environment** ```powershell python -m venv venv venv\Scripts\Activate.ps1 pip install --upgrade pip pip install -r requirements.txt ``` **3. Verify Docker and kind** ```powershell docker version kind version kubectl version --client ``` **4. Create a local Kubernetes cluster** ```powershell kind create cluster --name ml-workshop kubectl cluster-info --context kind-ml-workshop ``` > **Note:** Throughout the wiki, bash snippets like `source venv/bin/activate` become `venv\Scripts\Activate.ps1` in PowerShell, and paths use `\` instead of `/`. --- ## Learning Approach ### Scaffolded Exercises This workshop uses a **hands-on scaffolded approach**: ✅ **What you get:** - Complete file structure and imports - 80-90% of code already written - TODOs with inline hints ✅ **What you implement:** - Specific function calls (1-3 lines per TODO) - Key parameter values - Critical configuration - ~10-20% of each exercise --- ## Workshop Goals By the end of this workshop, you will be able to: ### Technical Skills - ✅ Train ML models with experiment tracking (MLflow) - ✅ Package models as production-ready APIs (BentoML) - ✅ Deploy services to Kubernetes with auto-scaling - ✅ Build high-performance infrastructure (Go) - ✅ Orchestrate ML workflows (Kubeflow) - ✅ Monitor model performance in production (Prometheus/Grafana) - ✅ Automate deployments with CI/CD (GitHub Actions) --- ## Additional Resources ### Workshop Guides - **[Setup Guide](Module-0)** - Detailed environment setup instructions ### External Documentation - [MLflow Documentation](https://mlflow.org/docs/latest/index.html) - [BentoML Documentation](https://docs.bentoml.com/) - [Kubernetes Documentation](https://kubernetes.io/docs/) - [Kubeflow Documentation](https://www.kubeflow.org/docs/) - [Prometheus Documentation](https://prometheus.io/docs/) - [GitHub Actions Documentation](https://docs.github.com/en/actions) --- ## Getting Help If you encounter issues: 1. **Check the module's Troubleshooting section** - Each module has common issues and fixes 2. **Review the [Troubleshooting Guide](Troubleshooting.md)** - Comprehensive troubleshooting resource 3. **Check solution files** - Located in `modules/module-X/solution/` --- ## Quick Start Ready to begin? Follow these steps: ### 1. Setup Your Environment Start with Module 0 to install all required tools: → **[Module 0: Setup Guide](Module-0)** ### 2. Follow Modules in Order Each module builds on the previous. **Do not skip modules.** ``` Module 0 → Module 1 → Module 2 → ... → Module 7 ``` ### 3. Complete All Exercises Each module has hands-on exercises with TODOs. Fill in the blanks. ### 4. Check solution if stuck --- ## What You'll Build By the end of this workshop, you'll have a **complete, production-ready MLOps platform**. The diagram below shows how the pieces you build in each module fit together: ```mermaid flowchart TB subgraph DEV["Developer workflow"] GH[GitHub repo] CI[GitHub Actions
Module 7] GH -->|push| CI end subgraph TRAIN["Training plane — Modules 1 & 5"] HF[Hugging Face Hub] KFP[Kubeflow Pipelines
Module 5] MLF[(MLflow
tracking + registry
Module 1)] HF --> KFP KFP -->|log runs & register models| MLF end subgraph SERVE["Serving plane — Modules 2, 3, 4"] U((User / client)) GW[Go API gateway
Module 4] ML[BentoML sentiment API
Module 2] K8S[(Kubernetes / kind
Module 3)] U -->|HTTP| GW -->|reverse proxy| ML ML -.runs on.-> K8S GW -.runs on.-> K8S end subgraph OBS["Observability — Module 6"] PROM[Prometheus] GRAF[Grafana dashboards] PROM --> GRAF end MLF -->|load model| ML CI -->|build & push images| REG[(ghcr.io)] REG -->|pull| K8S ML -->|/metrics| PROM GW -->|/metrics| PROM ``` **Components:** - **Model Training:** MLflow tracking + model registry (Module 1) - **Model Serving:** BentoML API + Docker containers (Module 2) - **Orchestration:** Kubernetes with auto-scaling (Module 3) - **API Gateway:** Go reverse proxy + middleware (Module 4) - **ML Pipelines:** Kubeflow for workflow automation (Module 5) - **Monitoring:** Prometheus metrics + Grafana dashboards (Module 6) - **CI/CD:** GitHub Actions for automated deployments (Module 7) --- ## Let's Get Started! → **[Begin with Module 0: Setup Guide](Module-0)**