-
Notifications
You must be signed in to change notification settings - Fork 35
Module 3.md
By the end of this module, you'll have:
- ✅ Production-ready Kubernetes deployment for your ML service
- ✅ Auto-scaling infrastructure that responds to traffic patterns
- ✅ High-availability setup surviving node failures
- ✅ Security-hardened containers running as non-root
- ✅ Complete health monitoring with all probe types
- ✅ Zero-downtime deployment capabilities
- ✅ Resource-optimized configuration preventing waste
Real-World Impact:
- Handle 10x traffic spikes automatically with HPA
- Maintain 99.9%+ uptime with proper health checks
- Pass security audits with hardened container configuration
- Deploy updates without service interruption
- Optimize costs by scaling down during low traffic
By the end of this module, you will:
- ✅ Deploy ML models to Kubernetes with production-ready configuration
- ✅ Configure resource limits and requests
- ✅ Implement all three health probe types
- ✅ Set up auto-scaling with Horizontal Pod Autoscaler
- ✅ Ensure high availability with Pod Disruption Budget
- ✅ Apply security best practices (non-root, read-only filesystem)
- ✅ Use ConfigMap for externalized configuration
- ✅ Implement pod anti-affinity for fault tolerance
- Completed Module 2 (BentoML service containerized)
- kind installed (Kubernetes in Docker)
- kubectl installed and configured
- Docker image from Module 2:
sentiment-api:v1
Single Exercise: Production-Ready Deployment
├─ ConfigMap for configuration
├─ Deployment with proper resource management
├─ Service for network access
├─ Health probes (startup, liveness, readiness)
├─ Horizontal Pod Autoscaler (HPA)
├─ Pod Disruption Budget (PDB)
├─ Security hardening
└─ High availability (anti-affinity)
What does "scaffolded" mean?
- 80-90% of YAML is provided for you
- You fill in ~10-20% (20 specific configuration values)
- Each TODO has inline hints showing exactly what to use
# Create cluster
kind create cluster --config modules/module-0/kind.yaml
# Verify
kubectl cluster-info
kubectl get nodes# Build image (from Module 2)
cd ../module-2
bentoml build
bentoml containerize sentiment_service:latest -t sentiment-api:v1
# Load into kind
cd ../module-3
kind load docker-image sentiment-api:v1 --name mlops-workshop
# Verify image is loaded
docker exec -it mlops-workshop-control-plane crictl images | grep sentiment# Install
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Patch for kind (disable TLS verification)
kubectl patch deployment metrics-server -n kube-system --type='json' \
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
# Verify
kubectl get deployment metrics-server -n kube-systemGoal: Deploy your ML service to Kubernetes with complete production configuration.
cd starter
# Open the file
open deployment.yaml
# Find and fill in 20 TODOs
# Look for: # YOUR CODE HERE
# Apply the manifest
kubectl apply -f deployment.yaml
# Verify all resources
kubectl get deployments,pods,svc,hpa,pdb -l app=sentiment-api
Test the API:
# Port forward
kubectl port-forward svc/sentiment-api-service 8080:80
# In another terminal, test prediction
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"request": {"text": "Kubernetes is awesome!","request_id": null}}'
# Expected response:
# {"sentiment": "POSITIVE", "score": 0.9998}
# Test health endpoint
curl http://localhost:8080/healthConfiguration (TODOs 1-2):
- Set BentoML port and worker count in ConfigMap
Deployment Basics (TODOs 3-6):
- Set deployment name, replicas, selector, and pod labels
Security (TODOs 7-8, 15-16):
- Configure pod and container security contexts
- Run as non-root user (UID 1000)
- Read-only root filesystem
High Availability (TODOs 9-10):
- Configure pod anti-affinity to spread across nodes
Container Configuration (TODOs 11-14):
- Set container name, image, pull policy, and port
Resources (TODOs 17-18):
- Configure CPU and memory requests (critical for HPA)
Health Probes (TODOs 19-20):
- Configure startup and readiness probes
- Deployments: Manage replica Pods, rolling updates
- Services: Stable network endpoint, load balancing
- ConfigMaps: Externalized configuration
- Labels & Selectors: Resource organization
- Requests: Guaranteed minimum for scheduling
- Limits: Maximum allowed (prevents exhaustion)
- QoS Classes: Guaranteed, Burstable, BestEffort
- Startup Probes: Handle slow ML model loading (30-60s)
- Liveness Probes: Detect and restart dead containers
- Readiness Probes: Control traffic routing
- Self-healing: Automatic recovery
- HPA (Horizontal Pod Autoscaler): Scale based on CPU/memory
- Metrics: CPU 70%, Memory 80% targets
- Scaling Policies: Stabilization windows, rate limits
- Pod Disruption Budget: Maintain availability during updates
- Pod Anti-affinity: Spread pods across nodes
- Rolling Updates: Zero-downtime deployments
- Fault Tolerance: Survive node failures
- Non-root Containers: Reduce attack surface
- Read-only Filesystem: Prevent file modifications
- Dropped Capabilities: Minimal privileges
- Security Contexts: Pod and container hardening
# View all resources
kubectl get all -l app=sentiment-api
# Describe resources
kubectl describe deployment sentiment-api
kubectl describe pod <pod-name>
kubectl describe hpa sentiment-api-hpa
# Monitor resources
kubectl top pod -l app=sentiment-api
kubectl get hpa -w
# View logs
kubectl logs -l app=sentiment-api -f
# Port forwarding
kubectl port-forward svc/sentiment-api-service 8080:80
# Manual scaling (HPA will override)
kubectl scale deployment sentiment-api --replicas=5
# Delete all resources
kubectl delete -f deployment.yamlSymptoms:
kubectl get pods
# Shows: STATUS=Pending for extended period
Root Causes:
- Image not loaded into kind cluster
- Insufficient cluster resources
- Node selector/affinity constraints not met
Solutions:
Check 1: Verify image is loaded
# Check if image exists in kind
docker exec -it mlops-workshop-control-plane crictl images | grep sentiment
# If not found, load it
kind load docker-image sentiment-api:v1 --name mlops-workshop
# Verify again
docker exec -it mlops-workshop-control-plane crictl images | grep sentimentCheck 2: Inspect pod events
# Get detailed information
kubectl describe pod <pod-name>
# Look for events like:
# - "FailedScheduling: 0/1 nodes available"
# - "ImagePullBackOff"
# - "Insufficient cpu/memory"Check 3: Verify cluster resources
# Check node resources
kubectl describe nodes
# Check resource requests
kubectl describe deployment sentiment-api
# If insufficient, reduce requests in deployment.yamlSymptoms:
kubectl get pods
# Shows: STATUS=ImagePullBackOff or ErrImagePull
Root Cause: Wrong imagePullPolicy for local kind cluster
Solution:
# Check current imagePullPolicy
kubectl get deployment sentiment-api -o yaml | grep imagePullPolicy
# Should be: imagePullPolicy: Never (for kind)
# Fix in deployment.yaml TODO 13
# If set to "Always" or "IfNotPresent", change to "Never"
# Then reapply:
kubectl apply -f deployment.yaml
# Force restart pods
kubectl rollout restart deployment sentiment-apiAlternative: Rebuild and reload image
cd ../module-2
bentoml build
bentoml containerize sentiment_service:latest -t sentiment-api:v1
cd ../module-3
kind load docker-image sentiment-api:v1 --name mlops-workshop
kubectl delete pods -l app=sentiment-apiSymptoms:
Error: container has runAsNonRoot and image will run as root
Error: container has runAsNonRoot and image has non-numeric user
Root Cause: BentoML default image runs as root, conflicts with runAsNonRoot: true
Solutions:
Option 1: Adjust security context (for workshop)
# In deployment.yaml, use specific UID
securityContext:
runAsUser: 1000
runAsNonRoot: true
# Remove runAsGroup if causing issuesOption 2: Build custom non-root image (production)
# In your BentoML project
# Create custom Dockerfile
FROM bentoml/bento-server:latest
# Create non-root user
RUN useradd -m -u 1000 bentouser && \
chown -R bentouser:bentouser /home/bentouser
USER bentouser
# Rest of your build...Option 3: Relax constraints temporarily
# For local testing only
securityContext:
# Comment out runAsNonRoot temporarily
# runAsNonRoot: true
readOnlyRootFilesystem: falseStill stuck? Check the solution file
# Create kind cluster
kind create cluster --name mlops-workshop
# Load image
kind load docker-image sentiment-api:v1 --name mlops-workshop
# Apply deployment
kubectl apply -f deployment.yaml
# Check status
kubectl get all -l app=sentiment-api# View pod logs
kubectl logs <pod-name>
# View logs from previous crashed container
kubectl logs <pod-name> --previous
# Follow logs in real-time
kubectl logs -f <pod-name>
# Logs from all pods with label
kubectl logs -l app=sentiment-api --all-containers=true
# Tail last 50 lines
kubectl logs <pod-name> --tail=50
# Logs since last 1 hour
kubectl logs <pod-name> --since=1h
# Exec into pod
kubectl exec -it <pod-name> -- /bin/bash
kubectl exec -it <pod-name> -- sh # if bash not available
# Run command in pod
kubectl exec <pod-name> -- curl localhost:3000/health# Port forward service
kubectl port-forward svc/sentiment-api-service 8080:80
# Port forward in background
kubectl port-forward svc/sentiment-api-service 8080:80 &
# Test API
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"text": "Test"}'# Manual scale (HPA will override)
kubectl scale deployment sentiment-api --replicas=5
# Check HPA status
kubectl get hpa sentiment-api-hpa
# Watch HPA in real-time
kubectl get hpa sentiment-api-hpa -w
watch kubectl get hpa sentiment-api-hpa
# Disable HPA temporarily
kubectl delete hpa sentiment-api-hpa
# Re-enable HPA
kubectl apply -f deployment.yaml# Install hey
go install github.com/rakyll/hey@latest
# Run load test
hey -z 2m -c 20 -m POST \
-H "Content-Type: application/json" \
-d '{"text":"Load test"}' \
http://localhost:8080/predict
# Using Apache Bench
ab -n 1000 -c 10 -p request.json -T application/json \
http://localhost:8080/predict
# Watch HPA respond
watch kubectl get hpa sentiment-api-hpaIf you get stuck, a complete reference implementation is available:
-
solution/deployment.yaml- All TODOs completed with detailed comments
Note: Try to complete the exercise on your own first! The solution is heavily commented to explain every configuration.
Once you've completed the exercise and tests pass:
→ Module 4: API Gateway with Go
In Module 4, you'll build a high-performance API gateway in Go to sit in front of your ML service!
- ✅ Kubernetes Deployments: Manage containerized ML workloads
- ✅ Resource Management: Prevent resource starvation and overcommit
- ✅ Health Probes: Enable self-healing and zero-downtime updates
- ✅ Auto-scaling: Automatically adjust capacity based on load
- ✅ High Availability: Survive node failures and maintenance
- ✅ Security: Run with least privilege, harden containers
- ✅ Production Patterns: Real-world best practices for ML deployments
- Always set resource requests (required for HPA)
- Use all three probe types (startup, liveness, readiness)
- Configure PDB to maintain availability during updates
- Run containers as non-root
- Use read-only root filesystem
- Externalize configuration with ConfigMap
- Spread pods across nodes with anti-affinity
- Set appropriate HPA targets for ML workloads (70% CPU, 80% memory)
- Use conservative scale-down policies (ML models take time to load)
Having issues? Check the Troubleshooting section or review the solution file!
| Previous | Home | Next |
|---|---|---|
| ← Module 2: Model Packaging & Serving | 🏠 Home | Module 4: API Gateway & Polyglot Architecture → |
MLOps Workshop | GitHub Repository