A production-ready, reusable template for deploying any ML model (forecasting, classification, etc.) with FastAPI. Just drop your model into the /app/models folder and deploy!
- β Modular & Scalable Architecture - Easy to extend and maintain
- β Dynamic Model Loading - Add models without code changes
- β Multiple Prediction Endpoints - Path-based or body-based routing
- β Model Versioning Support - Deploy multiple versions simultaneously
- β Production-Ready Logging - JSON logging for production environments
- β Pydantic Validation - Type-safe request/response handling
- β Docker & Docker Compose - Containerized deployment
- β Comprehensive Testing - Unit tests included
- β CORS Enabled - Ready for frontend integration
- β Auto-Generated Docs - Swagger UI & ReDoc
βββ app/
β βββ api/
β β βββ __init__.py
β β βββ routes.py # API endpoints
β β βββ schemas.py # Pydantic models
β βββ core/
β β βββ __init__.py
β β βββ config.py # Settings & configuration
β β βββ logging.py # Logging setup
β β βββ startup.py # Startup/shutdown events
β βββ models/
β β βββ __init__.py
β β βββ base.py # Base model interface
β β βββ forecast.py # Example: Forecasting model
β β βββ classifier.py # Example: Classification model
β βββ services/
β β βββ __init__.py
β β βββ model_service.py # Model loading & inference
β βββ utils/
β β βββ __init__.py # Utility functions
β βββ main.py # FastAPI application
βββ tests/
β βββ __init__.py
β βββ conftest.py
β βββ test_api.py
β βββ test_models.py
βββ model_storage/ # Store trained models here
βββ Dockerfile
βββ docker-compose.yml
βββ requirements.txt
βββ env.example # Environment variables template
βββ README.md
# No need to clone - this is your template!
cd your-project-directory# Create virtual environment
python -m venv venv
# Activate (Windows)
venv\Scripts\activate
# Activate (Linux/Mac)
source venv/bin/activate
# Install requirements
pip install -r requirements.txt# Copy environment template
cp env.example .env
# Edit .env with your settings# Development mode with auto-reload
python app/main.py
# Or using uvicorn directly
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- Health Check: http://localhost:8000/api/v1/health
# Build the image
docker build -t ml-model-api .
# Run the container
docker run -p 8000:8000 --env-file .env ml-model-api# Start all services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose downCreate a new file in app/models/, for example app/models/sentiment.py:
"""Sentiment Analysis Model"""
import logging
from typing import Dict, Any
from app.models.base import BaseMLModel
logger = logging.getLogger(__name__)
class SentimentModel(BaseMLModel):
"""Sentiment analysis model"""
def __init__(self):
super().__init__(model_name="sentiment", version="1.0")
self.model = None
async def load_model(self) -> None:
"""Load your trained sentiment model"""
logger.info(f"Loading {self.model_name} model...")
# Option 1: Load from file
# import joblib
# self.model = joblib.load("model_storage/sentiment.pkl")
# Option 2: Load from cloud storage
# self.model = load_from_s3("bucket", "sentiment.pkl")
# Option 3: Initialize new model (demo)
# from transformers import pipeline
# self.model = pipeline("sentiment-analysis")
self.is_loaded = True
logger.info(f"{self.model_name} model loaded")
async def predict(self, input_data: Dict[str, Any]) -> Dict[str, Any]:
"""Make predictions"""
if not self.is_loaded:
raise RuntimeError(f"Model {self.model_name} is not loaded")
text = input_data.get("text", "")
# Your prediction logic here
# result = self.model(text)
return {
"text": text,
"sentiment": "positive", # Replace with actual prediction
"confidence": 0.95,
"model": self.model_name,
"version": self.version
}Edit env.example or .env:
MODEL_REGISTRY={"forecast": "forecast.py", "classifier": "classifier.py", "sentiment": "sentiment.py"}# Restart the application
# The new model will be automatically loaded!
# Test it
curl -X POST "http://localhost:8000/api/v1/predict/sentiment" \
-H "Content-Type: application/json" \
-d '{"text": "This is amazing!"}'GET /api/v1/healthResponse:
{
"status": "ok",
"version": "1.0.0",
"models_loaded": 2
}GET /api/v1/modelsResponse:
{
"models": [
{
"name": "forecast",
"version": "1.0",
"loaded": true,
"available_versions": ["1"]
},
{
"name": "classifier",
"version": "1.0",
"loaded": true,
"available_versions": ["1"]
}
],
"total": 2
}GET /api/v1/models/{model_name}Example:
curl http://localhost:8000/api/v1/models/forecastPOST /api/v1/predict/{model_name}Example - Forecast:
curl -X POST "http://localhost:8000/api/v1/predict/forecast" \
-H "Content-Type: application/json" \
-d '{
"periods": 30,
"freq": "D"
}'Response:
{
"success": true,
"model": "forecast",
"version": "1.0",
"result": {
"predictions": [10.5, 11.2, 12.1, ...],
"dates": ["2024-01-01", "2024-01-02", ...],
"lower_bound": [9.5, 10.2, ...],
"upper_bound": [11.5, 12.2, ...],
"periods": 30
}
}Example - Classifier:
curl -X POST "http://localhost:8000/api/v1/predict/classifier" \
-H "Content-Type: application/json" \
-d '{
"features": [1.2, 3.4, 5.6, 7.8]
}'Response:
{
"success": true,
"model": "classifier",
"version": "1.0",
"result": {
"prediction": "class_1",
"probabilities": {
"class_0": 0.1,
"class_1": 0.7,
"class_2": 0.2
},
"confidence": 0.7
}
}POST /api/v1/predictExample:
curl -X POST "http://localhost:8000/api/v1/predict" \
-H "Content-Type: application/json" \
-d '{
"model_name": "forecast",
"input_data": {
"periods": 7,
"freq": "D"
},
"version": "1"
}'POST /api/v1/predict/{model_name}?v={version}Example:
curl -X POST "http://localhost:8000/api/v1/predict/forecast?v=2" \
-H "Content-Type: application/json" \
-d '{
"periods": 30,
"freq": "D"
}'POST /api/v1/models/{model_name}/reloadExample:
curl -X POST "http://localhost:8000/api/v1/models/forecast/reload"# Run all tests
pytest
# Run with coverage
pytest --cov=app
# Run specific test file
pytest tests/test_api.py
# Run with verbose output
pytest -vStore your trained models in the model_storage/ directory:
model_storage/
βββ forecast_model.pkl
βββ classifier_model.json
βββ sentiment_model.h5
βββ ...
Load them in your model's load_model() method:
import joblib
from app.core.config import settings
model_path = f"{settings.model_storage_path}/your_model.pkl"
self.model = joblib.load(model_path)All configuration is managed through environment variables (.env file):
# Application
APP_NAME="ML Model API"
APP_VERSION="1.0.0"
DEBUG=false
LOG_LEVEL=INFO
# Server
HOST=0.0.0.0
PORT=8000
WORKERS=1
# CORS
CORS_ORIGINS=["http://localhost:3000","http://localhost:8000"]
# Models
MODELS_PATH=./app/models
MODEL_REGISTRY={"forecast": "forecast.py", "classifier": "classifier.py"}
MODEL_STORAGE_PATH=./model_storageThe template includes production-ready JSON logging:
import logging
logger = logging.getLogger(__name__)
# Logs will include context
logger.info("Processing prediction", extra={
"model": "forecast",
"user_id": user_id,
"request_id": request_id
})Deploy multiple versions of the same model:
# In model_service.py
await model_service.load_model("forecast", "forecast.py", version="1")
await model_service.load_model("forecast", "forecast_v2.py", version="2")
# Use specific version
result = await model_service.predict(
model_name="forecast",
input_data=data,
version="2"
)Create model-specific schemas in app/api/schemas.py:
class SentimentRequest(BaseModel):
text: str = Field(..., min_length=1, max_length=1000)
language: str = Field(default="en", pattern="^(en|es|fr)$")Models are loaded asynchronously on startup, so the API starts quickly.
Models are properly unloaded during shutdown to free memory.
DEBUG=false
LOG_LEVEL=WARNING
WORKERS=4gunicorn app.main:app \
-w 4 \
-k uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 \
--timeout 120Create k8s/deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-model-api
spec:
replicas: 3
selector:
matchLabels:
app: ml-model-api
template:
metadata:
labels:
app: ml-model-api
spec:
containers:
- name: api
image: your-registry/ml-model-api:latest
ports:
- containerPort: 8000
env:
- name: WORKERS
value: "1"
livenessProbe:
httpGet:
path: /api/v1/health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10This is a template - customize it for your needs!
MIT License - Use freely!
For issues or questions:
- Check the Swagger UI at
/docs - Review the example models in
app/models/ - Read the API schemas in
app/api/schemas.py
- Replace example models with your actual trained models
- Configure model registry in
.env - Add model-specific validation schemas
- Set up monitoring (Prometheus, Grafana)
- Add authentication if needed (JWT, API keys)
- Deploy to cloud (AWS, GCP, Azure)
Happy Deploying! π