-
Notifications
You must be signed in to change notification settings - Fork 35
Module 2.md
By the end of this module, you'll have:
- ✅ REST API serving ML predictions
- ✅ Type-safe endpoints with automatic request/response validation
- ✅ Batch processing capability for high-throughput scenarios
- ✅ Comprehensive error handling and structured logging
- ✅ Health check endpoints for load balancer integration
- ✅ Swagger UI documentation auto-generated from your code
- ✅ Container-ready service deployable to Kubernetes
BentoML simplifies ML model serving by providing:
| Without BentoML | With BentoML |
|---|---|
| Manual API boilerplate | Automatic API generation |
| Custom serialization logic | Built-in model packaging |
| Manual Docker setup | One-command containerization |
| DIY health checks | Production endpoints included |
| Complex deployment configs | Simple bentofile.yaml |
| No automatic docs | Auto-generated Swagger UI |
Key Advantage: Focus on ML logic, not infrastructure plumbing.
By the end of this module, you will:
- ✅ Package ML models as REST APIs using BentoML 1.4+ (class-based services)
- ✅ Implement input validation with Pydantic v2
- ✅ Add error handling and logging for production
- ✅ Create batch processing endpoints
- ✅ Build production-ready ML services with proper monitoring
- Completed Module 1
- Python 3.9+ installed
- Basic understanding of REST APIs
- Basic knowledge of Python classes and decorators
This module uses a scaffolded learning approach with BentoML 1.4+ API where you'll complete three progressive exercises:
Exercise 1: Basic BentoML Service
├─ Define service class with @bentoml.service
├─ Initialize model in __init__
├─ Create prediction endpoint with @bentoml.api
└─ Use Python type hints for I/O
Exercise 2: Validation & Production Features
├─ Part 1: Pydantic Validation
└─ Part 2: Production Features
Benefits of the new API:
- ✅ Cleaner, more Pythonic class-based architecture
- ✅ Better type safety with native Python type hints
- ✅ Simpler model management (no separate save/load steps)
- ✅ Automatic OpenAPI spec generation
- ✅ Better IDE support and auto-completion
cd modules/module-2/starter
# Install dependencies (includes BentoML 1.4+)
pip install -r ../requirements.txtGoal: Create a basic sentiment analysis API with BentoML services
# Run the service
bentoml serve service_basic:SentimentService
# Test it
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
-d '{"text": "This is amazing!"}'
# Visit Swagger UI
open http://localhost:3000Key TODOs:
- Add
@bentoml.servicedecorator to class - Define
__init__method - Load pipeline in
__init__asself.pipeline - Add
@bentoml.apidecorator to predict method - Extract text and run prediction
- Return first result from result list
Goal: Build a production-ready service with Pydantic validation, error handling, logging, and batch processing
# Open the file
service_with_validation.py
# Find and fill in 25 TODOs (Part 1: TODOs 1-12, Part 2: TODOs 13-25)
# Run the service
bentoml serve service_with_validation:SentimentService
# Test valid input with tracking
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
-d '{"text": "Amazing!", "request_id": "test-123"}'
# Test invalid input (should fail with clear validation error)
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
-d '{"text": ""}'
# Test batch prediction
curl -X POST http://localhost:3000/batch_predict \
-H "Content-Type: application/json" \
-d '{"texts": ["Great!", "Terrible", "Okay"]}'
# Check health
curl http://localhost:3000/health
# Visit Swagger UI to see auto-generated docs
open http://localhost:3000
# Watch logs for request tracking
# Look for: [test-123] Prediction successful with latency metricsPart 1: Pydantic Validation (TODOs 1-12)
- Import Pydantic (BaseModel, Field, field_validator)
- Import typing, time, logging, uuid, datetime
- Define
SentimentRequestwith text and request_id fields - Add
@field_validator('text')custom validator (Pydantic v2) - Define
SentimentResponsewith tracking fields - Define
BatchSentimentRequestmodel - Define
BatchSentimentResponsemodel - Define
ErrorResponsemodel - Configure logging with
basicConfig() - Create logger instance
- Implement
generate_request_id()function - Implement
get_timestamp()function
Part 2: Production Features (TODOs 13-25)
- Add
@bentoml.servicedecorator to class - Load pipeline in
__init__ - Log that model is ready
- Add
@bentoml.apidecorator to predict - Log incoming request
- Add try/except block around prediction
- Run prediction
- Log successful prediction with metrics
- Return
SentimentResponsewith all fields - Log error with stack trace
- Return error response
- Add
@bentoml.apifor batch_predict - Implement production health check with timestamp
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
-d '{"text": "This workshop is amazing!"}'Response:
{
"text": "This workshop is amazing!",
"sentiment": "POSITIVE",
"confidence": 0.9998,
"request_id": "abc123"
}curl -X POST http://localhost:3000/batch_predict \
-H "Content-Type: application/json" \
-d '{
"texts": [
"I loved it!",
"Terrible experience.",
"Pretty good overall."
]
}'Response:
{
"results": [
{"text": "I loved it!", "sentiment": "POSITIVE", "confidence": 0.9995, ...},
{"text": "Terrible experience.", "sentiment": "NEGATIVE", "confidence": 0.9991, ...},
{"text": "Pretty good overall.", "sentiment": "POSITIVE", "confidence": 0.8876, ...}
],
"metadata": {
"count": 3,
"latency_ms": 45.2,
"throughput_per_sec": 66.4,
"avg_latency_per_text_ms": 15.07
},
"request_id": "batch-456"
}Prediction error:
{
"text": "test input",
"sentiment": "ERROR",
"confidence": 0.0,
"request_id": "abc123"
}Once your service is working, package it as a Bento for deployment:
# Build Bento (creates distributable package)
bentoml build
# List available Bentos
bentoml list
Convert your Bento to a Docker container:
# Containerize the latest Bento
bentoml containerize sentiment_service:latest -t sentiment-api:v1
# Or specify a specific version
bentoml containerize sentiment_service:abc123 -t sentiment-api:v1.0.0
# List Docker images
docker images | grep sentiment-apiTest your containerized service locally before deploying to Kubernetes:
# Run container
docker run -p 3000:3000 sentiment-api:v1
# Test the containerized service
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
-d '{"text": "Testing containerized service!"}'
# Check health endpoint
curl http://localhost:3000/health
# View container logs
docker logs <container-id>
# Stop container
docker stop <container-id>Next: In Module 3, you'll deploy this container to Kubernetes!
-
Service Classes: Define services with
@bentoml.servicedecorator -
Initialization: Load models in
__init__()method (runs once at startup) -
API Endpoints: Create routes with
@bentoml.apidecorator on methods - Type Hints: Use Python type hints for automatic I/O handling
- Resource Config: Set CPU/memory requirements in decorator
-
Request Models: Type-safe input validation with
BaseModel - Response Models: Structured output format
-
Field Constraints:
Field()withmin_length,max_length,ge,le -
Custom Validators:
@field_validatordecorator (Pydantic v2) -
Model Config:
model_configdict withjson_schema_extra
- Error Handling: Try/except with graceful error encoding
- Logging: Structured logs with request IDs for tracing
- Request Tracking: Unique IDs for debugging across services
- Performance Metrics: Latency and throughput monitoring
-
Health Checks:
/healthendpoint for load balancers
- Batch Endpoints: Process multiple inputs efficiently
- Metadata: Track performance metrics per batch
- Throughput: 5-10x speedup vs individual requests
- Error Handling: Graceful degradation for batch failures
Symptoms:
Error: Address already in use
OSError: [Errno 48] Address already in use
Solutions:
Option 1: Use different port
bentoml serve service_with_validation:SentimentService --port 3001Option 2: Kill existing process
# macOS
lsof -i :3000
kill -9 <PID>Option 3: Find and stop BentoML service
# Kill all BentoML processes
pkill -f "bentoml serve"
# Or more targeted
ps aux | grep bentoml
kill <PID>Symptoms:
ModuleNotFoundError: No module named 'bentoml'
ImportError: cannot import name 'field_validator' from 'pydantic'
Solutions:
# Step 1: Activate virtual environment
source venv/bin/activate
# Step 2: Reinstall dependencies
pip install -r requirements.txt
# Step 3: Verify BentoML version (should be 1.4+)
pip show bentoml
# Version should be >= 1.4.0
# Step 4: Verify Pydantic version (should be v2)
pip show pydantic
# Version should be >= 2.0.0
# Step 5: Check Python version
python --version
# Should be >= 3.9If issues persist:
# Clean install
pip uninstall bentoml pydantic -y
pip install --no-cache-dir bentoml>=1.4.0 pydantic>=2.0.0Symptoms:
HTTPError: 404 Client Error
OSError: Can't load tokenizer for 'distilbert-base-uncased'
Solutions:
Check 1: Model downloads automatically on first run
# Just start the service
bentoml serve service_basic:SentimentService
# Model downloads to cache (may take 1-2 minutes first time)
# Location: ~/.cache/huggingface/hub/Check 2: Verify cache location
ls ~/.cache/huggingface/hub/
# Should show model files after first runCheck 3: Manual download (if network issues)
# Pre-download model
python -c "
from transformers import pipeline
model = pipeline('sentiment-analysis', model='distilbert-base-uncased')
print('Model downloaded!')
"Check 4: Clear cache if corrupted
rm -rf ~/.cache/huggingface/hub/
# Then restart service to re-downloadStill stuck? Check the solution files
# Navigate to module
cd modules/module-2
# Install dependencies
pip install -r requirements.txt
# Serve basic service
cd starter
bentoml serve service_basic:SentimentService
# Serve with auto-reload (development)
bentoml serve service_with_validation:SentimentService --reload
# Serve on different port
bentoml serve service_with_validation:SentimentService --port 3001# Serve with live reload (changes auto-reload)
bentoml serve service_with_validation:SentimentService --reload
# Serve with specific host
bentoml serve service_with_validation:SentimentService --host 0.0.0.0
# Serve with development mode (more verbose logging)
bentoml serve service_with_validation:SentimentService --reload --host 0.0.0.0 --port 3000
# View all serve options
bentoml serve --help# Single prediction
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
-d '{"text": "This is amazing!"}'
# Batch prediction
curl -X POST http://localhost:3000/batch_predict \
-H "Content-Type: application/json" \
-d '{"texts": ["Great!", "Terrible", "Okay"]}'
# Health check
curl http://localhost:3000/health
# Get OpenAPI spec
curl http://localhost:3000/docs.json
# Visit Swagger UI (in browser)
open http://localhost:3000# Build Bento from bentofile.yaml
bentoml build
# List all Bentos
bentoml list
# Get Bento details
bentoml get sentiment_service:latest
# Delete specific Bento
bentoml delete sentiment_service:abc123
# Delete all versions of a Bento
bentoml delete sentiment_service --yes
# Export Bento to file
bentoml export sentiment_service:latest -o sentiment_service.bento
# Import Bento from file
bentoml import sentiment_service.bento# Build Docker image from Bento
bentoml containerize sentiment_service:latest
# Build with custom tag
bentoml containerize sentiment_service:latest -t sentiment-api:v1.0.0
# Build with custom Dockerfile template
bentoml containerize sentiment_service:latest --dockerfile-template ./custom.Dockerfile
# Push to registry
docker tag sentiment_service:latest myregistry.com/sentiment-service:v1
docker push myregistry.com/sentiment-service:v1If you get stuck, reference implementations are available in solution/:
-
service_basic.py- Exercise 1 completed -
service_with_validation.py- Exercise 2 completed
Note: Try to complete exercises on your own first! Learning happens when you struggle a bit.
After completing all exercises, try these:
- Add Caching: Implement response caching for repeated requests
- Async Endpoints: Convert to async/await for better concurrency
-
Metrics Endpoint: Add
/metricsendpoint for Prometheus - Custom Models: Replace with a different HuggingFace model
- Multiple Endpoints: Add sentiment + topic classification
Once you've completed all exercises and tests pass:
→ Module 3: Kubernetes Deployment
In Module 3, you'll deploy this BentoML service to Kubernetes!
Having issues? Check the Troubleshooting section or review the solution files!
| Previous | Home | Next |
|---|---|---|
| ← Module 1: Model Training & Experiment Tracking | 🏠 Home | Module 3: Kubernetes Deployment → |
MLOps Workshop | GitHub Repository