Module 2.md

Module 2: Model Packaging with BentoML 1.4+

What You'll Build

By the end of this module, you'll have:

✅ REST API serving ML predictions
✅ Type-safe endpoints with automatic request/response validation
✅ Batch processing capability for high-throughput scenarios
✅ Comprehensive error handling and structured logging
✅ Health check endpoints for load balancer integration
✅ Swagger UI documentation auto-generated from your code
✅ Container-ready service deployable to Kubernetes

What You'll Learn

Why BentoML?

BentoML simplifies ML model serving by providing:

Without BentoML	With BentoML
Manual API boilerplate	Automatic API generation
Custom serialization logic	Built-in model packaging
Manual Docker setup	One-command containerization
DIY health checks	Production endpoints included
Complex deployment configs	Simple bentofile.yaml
No automatic docs	Auto-generated Swagger UI

Key Advantage: Focus on ML logic, not infrastructure plumbing.

Learning Objectives

By the end of this module, you will:

✅ Package ML models as REST APIs using BentoML 1.4+ (class-based services)
✅ Implement input validation with Pydantic v2
✅ Add error handling and logging for production
✅ Create batch processing endpoints
✅ Build production-ready ML services with proper monitoring

Part 1: Setup & Prerequisites

Prerequisites

Completed Module 1
Python 3.9+ installed
Basic understanding of REST APIs
Basic knowledge of Python classes and decorators

Workshop Format

This module uses a scaffolded learning approach with BentoML 1.4+ API where you'll complete three progressive exercises:

Exercise 1: Basic BentoML Service
├─ Define service class with @bentoml.service
├─ Initialize model in __init__
├─ Create prediction endpoint with @bentoml.api
└─ Use Python type hints for I/O

Exercise 2: Validation & Production Features
├─ Part 1: Pydantic Validation
└─ Part 2: Production Features

Benefits of the new API:

✅ Cleaner, more Pythonic class-based architecture
✅ Better type safety with native Python type hints
✅ Simpler model management (no separate save/load steps)
✅ Automatic OpenAPI spec generation
✅ Better IDE support and auto-completion

Part 2: Hands-On Exercises

Quick Start

1. Setup

cd modules/module-2/starter

# Install dependencies (includes BentoML 1.4+)
pip install -r ../requirements.txt

2. Complete Exercises

Exercise 1: Basic Service

Goal: Create a basic sentiment analysis API with BentoML services

# Run the service
bentoml serve service_basic:SentimentService

# Test it
curl -X POST http://localhost:3000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": "This is amazing!"}'

# Visit Swagger UI
open http://localhost:3000

Key TODOs:

Add @bentoml.service decorator to class
Define __init__ method
Load pipeline in __init__ as self.pipeline
Add @bentoml.api decorator to predict method
Extract text and run prediction
Return first result from result list

Exercise 2: Input Validation & Production Features

Goal: Build a production-ready service with Pydantic validation, error handling, logging, and batch processing

# Open the file
service_with_validation.py

# Find and fill in 25 TODOs (Part 1: TODOs 1-12, Part 2: TODOs 13-25)

# Run the service
bentoml serve service_with_validation:SentimentService

# Test valid input with tracking
curl -X POST http://localhost:3000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": "Amazing!", "request_id": "test-123"}'

# Test invalid input (should fail with clear validation error)
curl -X POST http://localhost:3000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": ""}'

# Test batch prediction
curl -X POST http://localhost:3000/batch_predict \
     -H "Content-Type: application/json" \
     -d '{"texts": ["Great!", "Terrible", "Okay"]}'

# Check health
curl http://localhost:3000/health

# Visit Swagger UI to see auto-generated docs
open http://localhost:3000

# Watch logs for request tracking
# Look for: [test-123] Prediction successful with latency metrics

Part 1: Pydantic Validation (TODOs 1-12)

Import Pydantic (BaseModel, Field, field_validator)
Import typing, time, logging, uuid, datetime
Define SentimentRequest with text and request_id fields
Add @field_validator('text') custom validator (Pydantic v2)
Define SentimentResponse with tracking fields
Define BatchSentimentRequest model
Define BatchSentimentResponse model
Define ErrorResponse model
Configure logging with basicConfig()
Create logger instance
Implement generate_request_id() function
Implement get_timestamp() function

Part 2: Production Features (TODOs 13-25)

Add @bentoml.service decorator to class
Load pipeline in __init__
Log that model is ready
Add @bentoml.api decorator to predict
Log incoming request
Add try/except block around prediction
Run prediction
Log successful prediction with metrics
Return SentimentResponse with all fields
Log error with stack trace
Return error response
Add @bentoml.api for batch_predict
Implement production health check with timestamp

Part 3: Testing & Validation

Testing Examples

Single Prediction

curl -X POST http://localhost:3000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": "This workshop is amazing!"}'

Response:

{
  "text": "This workshop is amazing!",
  "sentiment": "POSITIVE",
  "confidence": 0.9998,
  "request_id": "abc123"
}

Batch Prediction

curl -X POST http://localhost:3000/batch_predict \
     -H "Content-Type: application/json" \
     -d '{
       "texts": [
         "I loved it!",
         "Terrible experience.",
         "Pretty good overall."
       ]
     }'

Response:

{
  "results": [
    {"text": "I loved it!", "sentiment": "POSITIVE", "confidence": 0.9995, ...},
    {"text": "Terrible experience.", "sentiment": "NEGATIVE", "confidence": 0.9991, ...},
    {"text": "Pretty good overall.", "sentiment": "POSITIVE", "confidence": 0.8876, ...}
  ],
  "metadata": {
    "count": 3,
    "latency_ms": 45.2,
    "throughput_per_sec": 66.4,
    "avg_latency_per_text_ms": 15.07
  },
  "request_id": "batch-456"
}

Error Response

Prediction error:

{
  "text": "test input",
  "sentiment": "ERROR",
  "confidence": 0.0,
  "request_id": "abc123"
}

Part 4: Deployment

Build Bento

Once your service is working, package it as a Bento for deployment:

# Build Bento (creates distributable package)
bentoml build

# List available Bentos
bentoml list

Containerization

Convert your Bento to a Docker container:

# Containerize the latest Bento
bentoml containerize sentiment_service:latest -t sentiment-api:v1

# Or specify a specific version
bentoml containerize sentiment_service:abc123 -t sentiment-api:v1.0.0

# List Docker images
docker images | grep sentiment-api

Local Docker Testing

Test your containerized service locally before deploying to Kubernetes:

# Run container
docker run -p 3000:3000 sentiment-api:v1

# Test the containerized service
curl -X POST http://localhost:3000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": "Testing containerized service!"}'

# Check health endpoint
curl http://localhost:3000/health

# View container logs
docker logs <container-id>

# Stop container
docker stop <container-id>

Next: In Module 3, you'll deploy this container to Kubernetes!

Key Concepts Covered

BentoML Fundamentals

Service Classes: Define services with @bentoml.service decorator
Initialization: Load models in __init__() method (runs once at startup)
API Endpoints: Create routes with @bentoml.api decorator on methods
Type Hints: Use Python type hints for automatic I/O handling
Resource Config: Set CPU/memory requirements in decorator

Pydantic v2 Validation

Request Models: Type-safe input validation with BaseModel
Response Models: Structured output format
Field Constraints: Field() with min_length, max_length, ge, le
Custom Validators: @field_validator decorator (Pydantic v2)
Model Config: model_config dict with json_schema_extra

Production Patterns

Error Handling: Try/except with graceful error encoding
Logging: Structured logs with request IDs for tracing
Request Tracking: Unique IDs for debugging across services
Performance Metrics: Latency and throughput monitoring
Health Checks: /health endpoint for load balancers

Batch Processing

Batch Endpoints: Process multiple inputs efficiently
Metadata: Track performance metrics per batch
Throughput: 5-10x speedup vs individual requests
Error Handling: Graceful degradation for batch failures

Part 5: Troubleshooting

Troubleshooting

Issue 1: Port 3000 already in use

Symptoms:

Error: Address already in use
OSError: [Errno 48] Address already in use

Solutions:

Option 1: Use different port

bentoml serve service_with_validation:SentimentService --port 3001

Option 2: Kill existing process

# macOS
lsof -i :3000
kill -9 <PID>

Option 3: Find and stop BentoML service

# Kill all BentoML processes
pkill -f "bentoml serve"

# Or more targeted
ps aux | grep bentoml
kill <PID>

Issue 2: Import errors

Symptoms:

ModuleNotFoundError: No module named 'bentoml'
ImportError: cannot import name 'field_validator' from 'pydantic'

Solutions:

# Step 1: Activate virtual environment
source venv/bin/activate 

# Step 2: Reinstall dependencies
pip install -r requirements.txt

# Step 3: Verify BentoML version (should be 1.4+)
pip show bentoml
# Version should be >= 1.4.0

# Step 4: Verify Pydantic version (should be v2)
pip show pydantic
# Version should be >= 2.0.0

# Step 5: Check Python version
python --version
# Should be >= 3.9

If issues persist:

# Clean install
pip uninstall bentoml pydantic -y
pip install --no-cache-dir bentoml>=1.4.0 pydantic>=2.0.0

Issue 3: Model not loading or downloading

Symptoms:

HTTPError: 404 Client Error
OSError: Can't load tokenizer for 'distilbert-base-uncased'

Solutions:

Check 1: Model downloads automatically on first run

# Just start the service
bentoml serve service_basic:SentimentService

# Model downloads to cache (may take 1-2 minutes first time)
# Location: ~/.cache/huggingface/hub/

Check 2: Verify cache location

ls ~/.cache/huggingface/hub/
# Should show model files after first run

Check 3: Manual download (if network issues)

# Pre-download model
python -c "
from transformers import pipeline
model = pipeline('sentiment-analysis', model='distilbert-base-uncased')
print('Model downloaded!')
"

Check 4: Clear cache if corrupted

rm -rf ~/.cache/huggingface/hub/
# Then restart service to re-download

Still stuck? Check the solution files

Part 6: Reference

Commands Cheat Sheet

Quick Start

# Navigate to module
cd modules/module-2

# Install dependencies
pip install -r requirements.txt

# Serve basic service
cd starter
bentoml serve service_basic:SentimentService

# Serve with auto-reload (development)
bentoml serve service_with_validation:SentimentService --reload

# Serve on different port
bentoml serve service_with_validation:SentimentService --port 3001

Development Commands

# Serve with live reload (changes auto-reload)
bentoml serve service_with_validation:SentimentService --reload

# Serve with specific host
bentoml serve service_with_validation:SentimentService --host 0.0.0.0

# Serve with development mode (more verbose logging)
bentoml serve service_with_validation:SentimentService --reload --host 0.0.0.0 --port 3000

# View all serve options
bentoml serve --help

API Testing Commands

# Single prediction
curl -X POST http://localhost:3000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": "This is amazing!"}'

# Batch prediction
curl -X POST http://localhost:3000/batch_predict \
     -H "Content-Type: application/json" \
     -d '{"texts": ["Great!", "Terrible", "Okay"]}'

# Health check
curl http://localhost:3000/health

# Get OpenAPI spec
curl http://localhost:3000/docs.json

# Visit Swagger UI (in browser)
open http://localhost:3000

Bento Management

# Build Bento from bentofile.yaml
bentoml build

# List all Bentos
bentoml list

# Get Bento details
bentoml get sentiment_service:latest

# Delete specific Bento
bentoml delete sentiment_service:abc123

# Delete all versions of a Bento
bentoml delete sentiment_service --yes

# Export Bento to file
bentoml export sentiment_service:latest -o sentiment_service.bento

# Import Bento from file
bentoml import sentiment_service.bento

Containerization Commands

# Build Docker image from Bento
bentoml containerize sentiment_service:latest

# Build with custom tag
bentoml containerize sentiment_service:latest -t sentiment-api:v1.0.0

# Build with custom Dockerfile template
bentoml containerize sentiment_service:latest --dockerfile-template ./custom.Dockerfile

# Push to registry
docker tag sentiment_service:latest myregistry.com/sentiment-service:v1
docker push myregistry.com/sentiment-service:v1

Solution Files

If you get stuck, reference implementations are available in solution/:

service_basic.py - Exercise 1 completed
service_with_validation.py - Exercise 2 completed

Note: Try to complete exercises on your own first! Learning happens when you struggle a bit.

Advanced Challenges (Optional)

After completing all exercises, try these:

Add Caching: Implement response caching for repeated requests
Async Endpoints: Convert to async/await for better concurrency
Metrics Endpoint: Add /metrics endpoint for Prometheus
Custom Models: Replace with a different HuggingFace model
Multiple Endpoints: Add sentiment + topic classification

Next Steps

Once you've completed all exercises and tests pass:

→ Module 3: Kubernetes Deployment

In Module 3, you'll deploy this BentoML service to Kubernetes!

Having issues? Check the Troubleshooting section or review the solution files!

Navigation

Previous	Home	Next
← Module 1: Model Training & Experiment Tracking	🏠 Home	Module 3: Kubernetes Deployment →

Quick Links

MLOps Workshop | GitHub Repository

Module 2.md

Module 2: Model Packaging with BentoML 1.4+

What You'll Build

What You'll Learn

Why BentoML?

Learning Objectives

Part 1: Setup & Prerequisites

Prerequisites

Workshop Format

Part 2: Hands-On Exercises

Quick Start

1. Setup

2. Complete Exercises

Exercise 1: Basic Service

Exercise 2: Input Validation & Production Features

Part 3: Testing & Validation

Testing Examples

Single Prediction

Batch Prediction

Error Response

Part 4: Deployment

Build Bento

Containerization

Local Docker Testing

Key Concepts Covered

BentoML Fundamentals

Pydantic v2 Validation

Production Patterns

Batch Processing

Part 5: Troubleshooting

Troubleshooting

Issue 1: Port 3000 already in use

Issue 2: Import errors

Issue 3: Model not loading or downloading

Part 6: Reference

Commands Cheat Sheet

Quick Start

Development Commands

API Testing Commands

Bento Management

Containerization Commands

Solution Files

Advanced Challenges (Optional)

Next Steps

Navigation

Quick Links

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally