Skip to content

Module 2.md

Rabieh Fashwall edited this page Nov 27, 2025 · 1 revision

Module 2: Model Packaging with BentoML 1.4+

What You'll Build

By the end of this module, you'll have:

  • ✅ REST API serving ML predictions
  • ✅ Type-safe endpoints with automatic request/response validation
  • ✅ Batch processing capability for high-throughput scenarios
  • ✅ Comprehensive error handling and structured logging
  • ✅ Health check endpoints for load balancer integration
  • ✅ Swagger UI documentation auto-generated from your code
  • ✅ Container-ready service deployable to Kubernetes

What You'll Learn

Why BentoML?

BentoML simplifies ML model serving by providing:

Without BentoML With BentoML
Manual API boilerplate Automatic API generation
Custom serialization logic Built-in model packaging
Manual Docker setup One-command containerization
DIY health checks Production endpoints included
Complex deployment configs Simple bentofile.yaml
No automatic docs Auto-generated Swagger UI

Key Advantage: Focus on ML logic, not infrastructure plumbing.

Learning Objectives

By the end of this module, you will:

  • ✅ Package ML models as REST APIs using BentoML 1.4+ (class-based services)
  • ✅ Implement input validation with Pydantic v2
  • ✅ Add error handling and logging for production
  • ✅ Create batch processing endpoints
  • ✅ Build production-ready ML services with proper monitoring

Part 1: Setup & Prerequisites

Prerequisites

  • Completed Module 1
  • Python 3.9+ installed
  • Basic understanding of REST APIs
  • Basic knowledge of Python classes and decorators

Workshop Format

This module uses a scaffolded learning approach with BentoML 1.4+ API where you'll complete three progressive exercises:

Exercise 1: Basic BentoML Service
├─ Define service class with @bentoml.service
├─ Initialize model in __init__
├─ Create prediction endpoint with @bentoml.api
└─ Use Python type hints for I/O

Exercise 2: Validation & Production Features
├─ Part 1: Pydantic Validation
└─ Part 2: Production Features

Benefits of the new API:

  • ✅ Cleaner, more Pythonic class-based architecture
  • ✅ Better type safety with native Python type hints
  • ✅ Simpler model management (no separate save/load steps)
  • ✅ Automatic OpenAPI spec generation
  • ✅ Better IDE support and auto-completion

Part 2: Hands-On Exercises

Quick Start

1. Setup

cd modules/module-2/starter

# Install dependencies (includes BentoML 1.4+)
pip install -r ../requirements.txt

2. Complete Exercises

Exercise 1: Basic Service

Goal: Create a basic sentiment analysis API with BentoML services

# Run the service
bentoml serve service_basic:SentimentService

# Test it
curl -X POST http://localhost:3000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": "This is amazing!"}'

# Visit Swagger UI
open http://localhost:3000

Key TODOs:

  1. Add @bentoml.service decorator to class
  2. Define __init__ method
  3. Load pipeline in __init__ as self.pipeline
  4. Add @bentoml.api decorator to predict method
  5. Extract text and run prediction
  6. Return first result from result list

Exercise 2: Input Validation & Production Features

Goal: Build a production-ready service with Pydantic validation, error handling, logging, and batch processing

# Open the file
service_with_validation.py

# Find and fill in 25 TODOs (Part 1: TODOs 1-12, Part 2: TODOs 13-25)

# Run the service
bentoml serve service_with_validation:SentimentService

# Test valid input with tracking
curl -X POST http://localhost:3000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": "Amazing!", "request_id": "test-123"}'

# Test invalid input (should fail with clear validation error)
curl -X POST http://localhost:3000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": ""}'

# Test batch prediction
curl -X POST http://localhost:3000/batch_predict \
     -H "Content-Type: application/json" \
     -d '{"texts": ["Great!", "Terrible", "Okay"]}'

# Check health
curl http://localhost:3000/health

# Visit Swagger UI to see auto-generated docs
open http://localhost:3000

# Watch logs for request tracking
# Look for: [test-123] Prediction successful with latency metrics

Part 1: Pydantic Validation (TODOs 1-12)

  1. Import Pydantic (BaseModel, Field, field_validator)
  2. Import typing, time, logging, uuid, datetime
  3. Define SentimentRequest with text and request_id fields
  4. Add @field_validator('text') custom validator (Pydantic v2)
  5. Define SentimentResponse with tracking fields
  6. Define BatchSentimentRequest model
  7. Define BatchSentimentResponse model
  8. Define ErrorResponse model
  9. Configure logging with basicConfig()
  10. Create logger instance
  11. Implement generate_request_id() function
  12. Implement get_timestamp() function

Part 2: Production Features (TODOs 13-25)

  1. Add @bentoml.service decorator to class
  2. Load pipeline in __init__
  3. Log that model is ready
  4. Add @bentoml.api decorator to predict
  5. Log incoming request
  6. Add try/except block around prediction
  7. Run prediction
  8. Log successful prediction with metrics
  9. Return SentimentResponse with all fields
  10. Log error with stack trace
  11. Return error response
  12. Add @bentoml.api for batch_predict
  13. Implement production health check with timestamp

Part 3: Testing & Validation

Testing Examples

Single Prediction

curl -X POST http://localhost:3000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": "This workshop is amazing!"}'

Response:

{
  "text": "This workshop is amazing!",
  "sentiment": "POSITIVE",
  "confidence": 0.9998,
  "request_id": "abc123"
}

Batch Prediction

curl -X POST http://localhost:3000/batch_predict \
     -H "Content-Type: application/json" \
     -d '{
       "texts": [
         "I loved it!",
         "Terrible experience.",
         "Pretty good overall."
       ]
     }'

Response:

{
  "results": [
    {"text": "I loved it!", "sentiment": "POSITIVE", "confidence": 0.9995, ...},
    {"text": "Terrible experience.", "sentiment": "NEGATIVE", "confidence": 0.9991, ...},
    {"text": "Pretty good overall.", "sentiment": "POSITIVE", "confidence": 0.8876, ...}
  ],
  "metadata": {
    "count": 3,
    "latency_ms": 45.2,
    "throughput_per_sec": 66.4,
    "avg_latency_per_text_ms": 15.07
  },
  "request_id": "batch-456"
}

Error Response

Prediction error:

{
  "text": "test input",
  "sentiment": "ERROR",
  "confidence": 0.0,
  "request_id": "abc123"
}

Part 4: Deployment

Build Bento

Once your service is working, package it as a Bento for deployment:

# Build Bento (creates distributable package)
bentoml build

# List available Bentos
bentoml list

Containerization

Convert your Bento to a Docker container:

# Containerize the latest Bento
bentoml containerize sentiment_service:latest -t sentiment-api:v1

# Or specify a specific version
bentoml containerize sentiment_service:abc123 -t sentiment-api:v1.0.0

# List Docker images
docker images | grep sentiment-api

Local Docker Testing

Test your containerized service locally before deploying to Kubernetes:

# Run container
docker run -p 3000:3000 sentiment-api:v1

# Test the containerized service
curl -X POST http://localhost:3000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": "Testing containerized service!"}'

# Check health endpoint
curl http://localhost:3000/health

# View container logs
docker logs <container-id>

# Stop container
docker stop <container-id>

Next: In Module 3, you'll deploy this container to Kubernetes!


Key Concepts Covered

BentoML Fundamentals

  • Service Classes: Define services with @bentoml.service decorator
  • Initialization: Load models in __init__() method (runs once at startup)
  • API Endpoints: Create routes with @bentoml.api decorator on methods
  • Type Hints: Use Python type hints for automatic I/O handling
  • Resource Config: Set CPU/memory requirements in decorator

Pydantic v2 Validation

  • Request Models: Type-safe input validation with BaseModel
  • Response Models: Structured output format
  • Field Constraints: Field() with min_length, max_length, ge, le
  • Custom Validators: @field_validator decorator (Pydantic v2)
  • Model Config: model_config dict with json_schema_extra

Production Patterns

  • Error Handling: Try/except with graceful error encoding
  • Logging: Structured logs with request IDs for tracing
  • Request Tracking: Unique IDs for debugging across services
  • Performance Metrics: Latency and throughput monitoring
  • Health Checks: /health endpoint for load balancers

Batch Processing

  • Batch Endpoints: Process multiple inputs efficiently
  • Metadata: Track performance metrics per batch
  • Throughput: 5-10x speedup vs individual requests
  • Error Handling: Graceful degradation for batch failures

Part 5: Troubleshooting

Troubleshooting

Issue 1: Port 3000 already in use

Symptoms:

Error: Address already in use
OSError: [Errno 48] Address already in use

Solutions:

Option 1: Use different port

bentoml serve service_with_validation:SentimentService --port 3001

Option 2: Kill existing process

# macOS
lsof -i :3000
kill -9 <PID>

Option 3: Find and stop BentoML service

# Kill all BentoML processes
pkill -f "bentoml serve"

# Or more targeted
ps aux | grep bentoml
kill <PID>

Issue 2: Import errors

Symptoms:

ModuleNotFoundError: No module named 'bentoml'
ImportError: cannot import name 'field_validator' from 'pydantic'

Solutions:

# Step 1: Activate virtual environment
source venv/bin/activate 

# Step 2: Reinstall dependencies
pip install -r requirements.txt

# Step 3: Verify BentoML version (should be 1.4+)
pip show bentoml
# Version should be >= 1.4.0

# Step 4: Verify Pydantic version (should be v2)
pip show pydantic
# Version should be >= 2.0.0

# Step 5: Check Python version
python --version
# Should be >= 3.9

If issues persist:

# Clean install
pip uninstall bentoml pydantic -y
pip install --no-cache-dir bentoml>=1.4.0 pydantic>=2.0.0

Issue 3: Model not loading or downloading

Symptoms:

HTTPError: 404 Client Error
OSError: Can't load tokenizer for 'distilbert-base-uncased'

Solutions:

Check 1: Model downloads automatically on first run

# Just start the service
bentoml serve service_basic:SentimentService

# Model downloads to cache (may take 1-2 minutes first time)
# Location: ~/.cache/huggingface/hub/

Check 2: Verify cache location

ls ~/.cache/huggingface/hub/
# Should show model files after first run

Check 3: Manual download (if network issues)

# Pre-download model
python -c "
from transformers import pipeline
model = pipeline('sentiment-analysis', model='distilbert-base-uncased')
print('Model downloaded!')
"

Check 4: Clear cache if corrupted

rm -rf ~/.cache/huggingface/hub/
# Then restart service to re-download

Still stuck? Check the solution files


Part 6: Reference

Commands Cheat Sheet

Quick Start

# Navigate to module
cd modules/module-2

# Install dependencies
pip install -r requirements.txt

# Serve basic service
cd starter
bentoml serve service_basic:SentimentService

# Serve with auto-reload (development)
bentoml serve service_with_validation:SentimentService --reload

# Serve on different port
bentoml serve service_with_validation:SentimentService --port 3001

Development Commands

# Serve with live reload (changes auto-reload)
bentoml serve service_with_validation:SentimentService --reload

# Serve with specific host
bentoml serve service_with_validation:SentimentService --host 0.0.0.0

# Serve with development mode (more verbose logging)
bentoml serve service_with_validation:SentimentService --reload --host 0.0.0.0 --port 3000

# View all serve options
bentoml serve --help

API Testing Commands

# Single prediction
curl -X POST http://localhost:3000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": "This is amazing!"}'

# Batch prediction
curl -X POST http://localhost:3000/batch_predict \
     -H "Content-Type: application/json" \
     -d '{"texts": ["Great!", "Terrible", "Okay"]}'

# Health check
curl http://localhost:3000/health

# Get OpenAPI spec
curl http://localhost:3000/docs.json

# Visit Swagger UI (in browser)
open http://localhost:3000

Bento Management

# Build Bento from bentofile.yaml
bentoml build

# List all Bentos
bentoml list

# Get Bento details
bentoml get sentiment_service:latest

# Delete specific Bento
bentoml delete sentiment_service:abc123

# Delete all versions of a Bento
bentoml delete sentiment_service --yes

# Export Bento to file
bentoml export sentiment_service:latest -o sentiment_service.bento

# Import Bento from file
bentoml import sentiment_service.bento

Containerization Commands

# Build Docker image from Bento
bentoml containerize sentiment_service:latest

# Build with custom tag
bentoml containerize sentiment_service:latest -t sentiment-api:v1.0.0

# Build with custom Dockerfile template
bentoml containerize sentiment_service:latest --dockerfile-template ./custom.Dockerfile

# Push to registry
docker tag sentiment_service:latest myregistry.com/sentiment-service:v1
docker push myregistry.com/sentiment-service:v1

Solution Files

If you get stuck, reference implementations are available in solution/:

  • service_basic.py - Exercise 1 completed
  • service_with_validation.py - Exercise 2 completed

Note: Try to complete exercises on your own first! Learning happens when you struggle a bit.

Advanced Challenges (Optional)

After completing all exercises, try these:

  1. Add Caching: Implement response caching for repeated requests
  2. Async Endpoints: Convert to async/await for better concurrency
  3. Metrics Endpoint: Add /metrics endpoint for Prometheus
  4. Custom Models: Replace with a different HuggingFace model
  5. Multiple Endpoints: Add sentiment + topic classification

Next Steps

Once you've completed all exercises and tests pass:

Module 3: Kubernetes Deployment

In Module 3, you'll deploy this BentoML service to Kubernetes!


Having issues? Check the Troubleshooting section or review the solution files!


Navigation

Previous Home Next
Module 1: Model Training & Experiment Tracking 🏠 Home Module 3: Kubernetes Deployment

Quick Links


MLOps Workshop | GitHub Repository

Clone this wiki locally