Module 5.md

Module 5: Kubeflow Pipelines & Model Serving

What You'll Build

By the end of this module, you will have:

✅ Running Kubeflow Pipelines on local Kubernetes
✅ Working ML pipeline (data prep → train → evaluate → deploy)
✅ Model deployed as REST API with KServe
✅ Understanding of production ML orchestration

What You'll Learn

What Kubeflow Pipelines and KServe are and when to use them
How to build reusable pipeline components
How to create end-to-end ML workflows
How to deploy models as scalable APIs
How to integrate deployed models into applications

Part 1: Setup

Prerequisites

Before starting, ensure you have:

Docker Desktop installed and running

docker --version
docker ps  # Should connect without error

kubectl (Kubernetes CLI)
```
kubectl version --client
```
kind (Kubernetes in Docker)
```
kind version
```
Python 3.9-3.13
```
python3 --version
```
8GB+ RAM available for the Kubernetes cluster
- Check Docker Desktop > Settings > Resources

Install Kubeflow Pipelines

Step 1: Run Installation Script

cd modules/module-5

# Run automated installation
./scripts/install-kubeflow.sh

What this does:

Creates kind cluster named mlops-workshop (if not exists)
Installs cert-manager (required for Kubeflow)
Installs Kubeflow Pipelines v2.14.3
Patches minio with compatible image
Waits for all components to start

Expected output:

✓ kind cluster created/verified
✓ Kubeflow Pipelines installed
✓ Waiting for pods to be ready...
✓ Installation complete!

Next steps:
  1. kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80
  2. Open http://localhost:8080

Step 2: Wait for Pods

Installation takes 5-10 minutes on first run. Monitor progress:

# Watch pod status (Ctrl+C to exit when all Running)
kubectl get pods -n kubeflow -w

All pods should show Running with 1/1 or 2/2 READY:

NAME                                      READY   STATUS    RESTARTS   AGE
cache-server-xxx                          2/2     Running   0          3m
metadata-envoy-deployment-xxx             1/1     Running   0          3m
metadata-grpc-deployment-xxx              2/2     Running   0          3m
minio-xxx                                 2/2     Running   0          3m
ml-pipeline-xxx                           2/2     Running   0          3m
ml-pipeline-ui-xxx                        2/2     Running   0          3m
mysql-xxx                                 2/2     Running   0          3m
workflow-controller-xxx                   2/2     Running   0          3m

Step 3: Access Kubeflow UI

# In a separate terminal, keep this running
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

Open browser: http://localhost:8080

You should see the Kubeflow Pipelines dashboard.

Set Up Python Environment

Step 1: Create Virtual Environment

cd modules/module-5

# Create virtual environment
python3 -m venv venv

# Activate it
source venv/bin/activate  # macOS/Linux

Step 2: Install Dependencies

# Navigate to starter directory
cd starter

# Install all required packages
pip install -r requirements.txt

What gets installed:

kfp==2.14.3 - Kubeflow Pipelines SDK
pandas, numpy, scikit-learn - ML libraries
setuptools, wheel - Build tools (for Python 3.13+)

Step 3: Verify Installation

# Check kfp version
pip show kfp
# Should show: Version: 2.14.3

# Test imports
python3 -c "from kfp.dsl import component, pipeline; print('✓ KFP imports working')"

Verify Installation

Run the verification script:

cd modules/module-5
./scripts/verify-installation.sh

Expected output:

✓ kubectl installed
✓ kind cluster running
✓ kubeflow namespace exists
✓ All pods running
✓ ml-pipeline-ui service available
✓ All checks passed!

If any checks fail, see Troubleshooting

Part 2: Understanding the Concepts

What is Kubeflow Pipelines?

The Problem: Manual ML Workflows

Traditional ML workflow:

python prepare_data.py
python train_model.py --data=./data/train.csv
python evaluate.py --model=./models/model.pkl
kubectl apply -f deployment.yaml  # if model is good

Problems:

❌ Not reproducible (hard to rerun exactly)
❌ No tracking (which data? which code?)
❌ Manual (human runs each step)
❌ No dependency management (what if step fails?)
❌ Hard to share with team

The Solution: Automated Pipelines

Kubeflow Pipelines turns your ML workflow into an automated, reproducible graph:

┌─────────────┐
│ Data Prep   │ → train_data, test_data
└──────┬──────┘
       ↓
┌─────────────┐
│ Train Model │ → trained_model
└──────┬──────┘
       ↓
┌─────────────┐    ┌─────────────┐
│ Evaluate    │←───┤ test_data   │
└──────┬──────┘    └─────────────┘
       ↓
┌─────────────┐
│ Deploy      │ → REST API
│ (KServe)    │
└─────────────┘

Benefits:

✅ Reproducible - Same code + data + parameters = same results
✅ Automated - One click to run entire workflow
✅ Tracked - All inputs, outputs, and metrics logged
✅ Scalable - Runs on Kubernetes, auto-scales
✅ Shareable - Export as YAML, anyone can run

Key Concepts

1. Components

Components are self-contained code blocks that run in isolated containers:

from kfp.dsl import component, Output, Dataset

@component(
    base_image="python:3.11-slim",
    packages_to_install=["pandas==2.0.3"]
)
def prepare_data(output_data: Output[Dataset]):
    """Download and prepare data"""
    import pandas as pd

    data = pd.read_csv("https://example.com/data.csv")
    data.to_csv(output_data.path, index=False)

Key features:

Runs in own container (isolated)
Declares dependencies (pandas)
Typed outputs (Dataset)
Reusable across pipelines

2. Pipelines

Pipelines connect components into a workflow (DAG):

from kfp.dsl import pipeline

@pipeline(name="my-ml-pipeline")
def my_pipeline():
    # Step 1: Prepare data
    prep_task = prepare_data()

    # Step 2: Train (uses step 1 output)
    train_task = train_model(
        data=prep_task.outputs["output_data"]
    )

    # Ensure order
    train_task.after(prep_task)

Automatic features:

Runs steps in correct order
Passes data between steps
Tracks all inputs/outputs
Handles failures

3. Artifacts

Artifacts are typed data passed between components:

Type	Purpose	Example
`Dataset`	Training/test data	CSV files
`Model`	Trained models	Pickle, SavedModel
`Metrics`	Performance metrics	Accuracy, RMSE

How it works:

def train_model(
    train_data: Input[Dataset],  # Read from previous step
    model: Output[Model]          # Write for next step
):
    df = pd.read_csv(train_data.path)
    trained = fit(df)
    pickle.dump(trained, open(model.path, 'wb'))

Kubeflow stores artifacts in Minio (S3-compatible storage)
Components read/write using .path
Automatic lineage tracking

What is KServe?

The Model Serving Problem

After training a model, you need to:

Create HTTP server for predictions
Handle scaling (0 to many replicas)
Manage deployments (blue/green, canary)
Monitor performance

Doing this manually is complex!

The Solution: KServe

KServe is a Kubernetes-native platform that turns your model into a production REST API:

Your Model (model.pkl)
       ↓ Deploy
┌──────────────────────────────┐
│ KServe InferenceService     │
│                              │
│  HTTP: /v1/models/NAME:predict
│                              │
│  Auto-scaling: 0 → many pods │
│  Monitoring: metrics, logs   │
└──────────────────────────────┘
       ↓
Your App calls API

What you get:

✅ Standard API - All models use same format
✅ Auto-scaling - Scale to zero (save $), scale up on traffic
✅ Canary deployments - Test new versions with % of traffic
✅ Monitoring - Request logs, latency, errors

API Format

All KServe models use this standard format:

Request:

POST /v1/models/MODEL_NAME:predict
{
  "instances": [
    {"user_id": 1, "n_recommendations": 5}
  ]
}

Response:

{
  "predictions": [
    {
      "user_id": 1,
      "recommendations": [
        {"movie_id": 50, "movie_name": "Star Wars", "score": 0.89}
      ]
    }
  ]
}

How They Work Together

┌────────────────────────────────────────────┐
│       Kubeflow Pipeline                    │
│                                            │
│  Data Prep → Train → Evaluate → Deploy    │
│                          ↓                 │
│                     model.pkl              │
└──────────────────────┬─────────────────────┘
                       │ Creates
                       ↓
┌────────────────────────────────────────────┐
│       KServe InferenceService              │
│                                            │
│  REST API: http://service:8080/predict    │
│                                            │
│  Your App → [Request] → [Model] → [Response]
└────────────────────────────────────────────┘

Complete workflow:

Kubeflow Pipeline trains and evaluates model
Deploy component creates KServe InferenceService
KServe starts serving model as HTTP API
Your application calls API for predictions

Part 3: Hands-On Exercises

Exercise 1: Data Preparation Component

Goal: Build a component to download and prepare the MovieLens dataset.

Step 1: Open the File

cd modules/module-5/starter

# Open in your editor
code components/data_prep.py  # VS Code
# OR
nano components/data_prep.py  # Terminal editor
# OR
open components/data_prep.py  # macOS default

Step 2: Complete the TODOs

You need to implement:

@component decorator - Configure base image and packages
Download dataset - Use urllib.request.urlretrieve()
Load data - Read with pd.read_csv()
Train/test split - Use train_test_split()
Save outputs - Write to train_data.path and test_data.path
Movie metadata - Extract genres from binary columns

Key concepts to use:

Output[Dataset] for outputs
.path attribute to get file path
pd.read_csv() and df.to_csv()

Stuck? Look at solution/components/data_prep_solution.py

Exercise 2: Training & Evaluation

Goal: Create components for training a collaborative filtering model and evaluating it.

Part A: Training Component

cd modules/module-5/starter
open components/train.py

What you'll implement:

Load training data using Input[Dataset]
Create user-movie matrix with numpy
Train TruncatedSVD model
Calculate training RMSE
Log metrics with metrics.log_metric()
Save model with pickle to model.path

Key concepts:

Input[Dataset] for inputs, Output[Model] for model
Input[Metrics] for logging to Kubeflow
Matrix factorization with SVD

Part B: Evaluation Component

open components/evaluate.py

What you'll implement:

Load test data and model
Calculate predictions
Compute RMSE and MAE
Log evaluation metrics

Stuck? Look at solution/components/train_solution.py and evaluate_solution.py

Exercise 3: Pipeline Orchestration

Goal: Connect all components into an end-to-end pipeline.

Step 1: Open Pipeline File

cd modules/module-5/starter
open recommendation_pipeline.py

Step 2: Complete the TODOs

You'll create a pipeline that:

Runs data preparation - Call prepare_data()
Trains model - Call train_model() with outputs from step 1
Evaluates model - Call evaluate_model() with test data and model
Connects components - Pass outputs as inputs
Sets dependencies - Use .after() to control order

Pipeline structure:

@pipeline(name="movie-recommendation-pipeline")
def recommendation_pipeline():
    # Step 1
    data_task = prepare_data()

    # Step 2 (uses data from step 1)
    train_task = train_model(
        train_data=data_task.outputs["train_data"]
    )

    # Step 3 (uses test data and model)
    eval_task = evaluate_model(
        test_data=data_task.outputs["test_data"],
        model=train_task.outputs["model"]
    )

Part 4: Model Deployment

Build Model Serving Container

Before deploying with KServe, we need to create a Docker image that can serve predictions.

Step 1: Review Serving Code

cd modules/module-5/model
ls

Files:

Dockerfile - Container definition
serve.py - Flask server implementing KServe v2 protocol
recommender.py - Movie recommendation logic
requirements.txt - Python dependencies

How it works:

# serve.py
@app.route("/v1/models/recommender:predict", methods=["POST"])
def predict():
    request_data = request.get_json()
    user_id = request_data["instances"][0]["user_id"]

    # Load model from minio and predict
    recommendations = model.recommend_movies(user_id)

    return jsonify({"predictions": [recommendations]})

Step 2: Build Docker Image

cd modules/module-5/model

# Build image
docker build -t movie-recommender:latest .

What this does:

Uses Python 3.11 slim base image
Installs pandas, scikit-learn, numpy, boto3
Copies serve.py and recommender.py
Exposes port 8080
Starts Flask server

Step 3: Tag for kind

IMPORTANT: Knative Serving (used by KServe) tries to resolve image digests from Docker Hub. For local images, we need to use a special prefix:

# Tag with kind.local prefix (bypasses digest resolution)
docker tag movie-recommender:latest kind.local/movie-recommender:latest

Why kind.local?

Knative config has registries-skipping-tag-resolving: "kind.local,ko.local,dev.local"
Images with these prefixes skip digest resolution
Allows local images to work without pushing to registry

Step 4: Load to kind Cluster

# Load image into kind cluster
kind load docker-image kind.local/movie-recommender:latest --name mlops-workshop

# Verify image is loaded
docker exec mlops-workshop-control-plane crictl images | grep movie-recommender

Expected output:

kind.local/movie-recommender   latest   abc123def456   50MB

Deploy with KServe

Step 1: Apply RBAC Permissions

KServe needs permissions to create InferenceServices:

cd modules/module-5

# Apply RBAC fix
kubectl apply -f kserve/rbac-fix.yaml

Verify permissions:

kubectl auth can-i create inferenceservices \
  --as=system:serviceaccount:kubeflow:pipeline-runner \
  -n default

Should return: yes

Step 2: Compile Pipeline with Deployment

cd modules/module-5/solution

# Compile solution pipeline (includes deploy component)
python3 recommendation_pipeline_solution.py --output pipeline_with_deploy.yaml

Step 3: Upload and Run

Upload pipeline_with_deploy.yaml to Kubeflow UI
Create run with parameters:
- deploy_model_flag: True (enable deployment)
- Other parameters: use defaults
Start the run

Pipeline will:

Prepare data (5 min)
Train model (3 min)
Evaluate model (2 min)
Deploy to KServe (2 min)

Step 4: Verify Deployment

# Check InferenceService
kubectl get inferenceservice -n default

# Expected output:
# NAME                URL                                    READY
# movie-recommender   http://movie-recommender.default...    True

Check predictor pods:

kubectl get pods -n default | grep movie-recommender

# Expected: 1-2 predictor pods running

View logs:

# Get pod name
kubectl get pods -n default -l serving.kserve.io/inferenceservice=movie-recommender

# View logs
kubectl logs -n default <pod-name>

Call the REST API

Step 1: Port-Forward to Service

# In a separate terminal
kubectl port-forward -n default pod/movie-recommender-predictor 8080:80

Step 2: Make a Prediction

# Basic prediction
curl -X POST http://localhost:8080/v1/models/recommender:predict \
  -H "Content-Type: application/json" \
  -d '{
    "instances": [
      {"user_id": 1, "n_recommendations": 5}
    ]
  }'

Expected response:

{
  "predictions": [
    {
      "user_id": 1,
      "recommendations": [
        {
          "movie_id": 50,
          "movie_name": "Star Wars (1977)",
          "score": 0.89,
          "genres": ["Action", "Adventure", "Sci-Fi"]
        },
        {
          "movie_id": 181,
          "movie_name": "Return of the Jedi (1983)",
          "score": 0.87,
          "genres": ["Action", "Adventure", "Sci-Fi"]
        }
      ]
    }
  ]
}

If you only see movie_id (no names/genres), see Re-running Pipeline

Step 3: Test Different Users

# Try different user IDs
curl -X POST http://localhost:8080/v1/models/recommender:predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [{"user_id": 42, "n_recommendations": 3}]}'

Part 5: Integration & Advanced

JavaScript Client Example

class MovieRecommenderClient {
  constructor(baseUrl = 'http://localhost:8080') {
    this.baseUrl = baseUrl;
    this.predictUrl = `${baseUrl}/v1/models/recommender:predict`;
  }

  async getRecommendations(userId, nRecommendations = 10, genre = null) {
    const payload = {
      instances: [
        {
          user_id: userId,
          n_recommendations: nRecommendations
        }
      ]
    };

    // Add genre filter if specified
    if (genre) {
      payload.instances[0].genre = genre;
    }

    const response = await fetch(this.predictUrl, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json'
      },
      body: JSON.stringify(payload)
    });

    if (!response.ok) {
      throw new Error(`HTTP error! status: ${response.status}`);
    }

    const data = await response.json();
    return data.predictions[0];
  }
}

// Usage
const client = new MovieRecommenderClient();

client.getRecommendations(1, 5)
  .then(result => {
    result.recommendations.forEach(rec => {
      console.log(`${rec.movie_name} (${rec.score.toFixed(2)})`);
      console.log(`  Genres: ${rec.genres.join(', ')}`);
    });
  })
  .catch(error => console.error('Error:', error));

Genre Filtering

The service supports filtering recommendations by genre:

Available Genres

Action
Adventure
Animation
Children
Comedy
Crime
Documentary
Drama
Fantasy
Film-Noir
Horror
Musical
Mystery
Romance
Sci-Fi
Thriller
War
Western

Examples

Get only Action movies:

curl -X POST http://localhost:8080/v1/models/recommender:predict \
  -H "Content-Type: application/json" \
  -d '{
    "instances": [
      {
        "user_id": 1,
        "n_recommendations": 5,
        "genre": "Action"
      }
    ]
  }'

Get Comedy movies:

curl -X POST http://localhost:8080/v1/models/recommender:predict \
  -H "Content-Type: application/json" \
  -d '{
    "instances": [
      {
        "user_id": 1,
        "n_recommendations": 5,
        "genre": "Comedy"
      }
    ]
  }'

How it works:

Case-insensitive partial matching
"sci" matches "Sci-Fi"
"action" matches "Action"
Only returns movies with matching genre

Part 6: Troubleshooting

Common Issues & Solutions

Issue 1: Minio Pod CrashLoopBackOff

Error:

minio-xxx   0/2   CrashLoopBackOff

Cause: Incompatible minio image (especially on ARM/Apple Silicon)

Fix:

# Patch with compatible image
kubectl set image deployment/minio -n kubeflow \
  minio=minio/minio:RELEASE.2025-09-07T16-13-09Z-cpuv1

# Wait for rollout
kubectl rollout status deployment/minio -n kubeflow

# Verify
kubectl get pods -n kubeflow | grep minio
# Should show: minio-xxx   2/2   Running

Issue 2: Python 3.13 - Cannot import setuptools

Error:

Cannot import 'setuptools.build_meta'

Cause: Python 3.13 doesn't include setuptools by default

Fix:

# Install setuptools first
pip install --upgrade pip setuptools wheel

# Then install requirements
pip install -r requirements.txt

# Verify
pip show setuptools  # Should show 65.0.0+

Issue 3: KServe 403 Forbidden

Error:

ApiException: (403) Forbidden
User 'system:serviceaccount:kubeflow:pipeline-runner' cannot create inferenceservices

Cause: Missing RBAC permissions

Fix:

# Apply RBAC
kubectl apply -f kserve/rbac-fix.yaml

# Verify
kubectl auth can-i create inferenceservices \
  --as=system:serviceaccount:kubeflow:pipeline-runner \
  -n default
# Should return: yes

Issue 4: KServe Image Pull Error

Error:

Unable to fetch image "movie-recommender:latest"
failed to resolve image to digest: 401 Unauthorized

Cause: Knative tries to pull from Docker Hub, but image is local

Fix:

# 1. Tag with kind.local prefix
docker tag movie-recommender:latest kind.local/movie-recommender:latest

# 2. Load to kind
kind load docker-image kind.local/movie-recommender:latest --name mlops-workshop

# 3. Verify loaded
docker exec mlops-workshop-control-plane crictl images | grep movie-recommender

# 4. Update InferenceService
kubectl patch inferenceservice movie-recommender -n default --type='json' \
  -p='[{"op": "replace", "path": "/spec/predictor/containers/0/image", "value": "kind.local/movie-recommender:latest"}]'

# OR delete and recreate
kubectl delete inferenceservice movie-recommender -n default
# Then re-run pipeline with deploy_model_flag=True

Part 7: Reference

Commands Cheat Sheet

Kubeflow Management

# Access UI
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

# View pods
kubectl get pods -n kubeflow

# View pipeline runs
kubectl get pipelineruns -n kubeflow

# View component logs
kubectl logs <pod-name> -n kubeflow

# Restart Kubeflow
kubectl rollout restart deployment -n kubeflow

Cleanup

# Delete pipeline run
kubectl delete pipelinerun <run-name> -n kubeflow

# Delete InferenceService
kubectl delete inferenceservice movie-recommender -n default

# Uninstall Kubeflow
kubectl delete namespace kubeflow
kubectl delete namespace cert-manager

# Delete kind cluster
kind delete cluster --name mlops-workshop

What You've Accomplished

✅ Automated ML Workflows - End-to-end pipelines in code

✅ Component Reusability - Build once, use everywhere

✅ Artifact Tracking - Automatic versioning and lineage

✅ Production Deployment - Models as REST APIs with KServe

✅ Enterprise Orchestration - Kubernetes-native ML platform

Comparison with Manual Workflows

Feature	Manual	With Kubeflow
Workflow	Manual steps	Automated pipeline
Tracking	None	All artifacts logged
Reproducibility	Hard	One-click rerun
Deployment	Manual kubectl	Automated with KServe
Collaboration	Hard to share	Share YAML file
Visibility	Logs only	Visual graph + metrics

Next Steps

Extend your pipeline:

Add hyperparameter tuning component (GridSearchCV)
Implement A/B testing with canary deployments (10% traffic)
Add data validation (Great Expectations)
Create model monitoring dashboard (Prometheus + Grafana)

Production considerations:

Set up persistent artifact storage (S3, GCS)
Configure resource limits and auto-scaling
Implement CI/CD for pipeline updates (GitHub Actions)
Add authentication and authorization

Continue learning:

Navigation

Previous	Home	Next
← Module 4: API Gateway & Polyglot Architecture	🏠 Home	Module 6: Monitoring & Observability →

Quick Links

MLOps Workshop | GitHub Repository

Congratulations! 🎉

You've mastered enterprise ML workflow orchestration with Kubeflow Pipelines and model serving with KServe!

Module 5.md

Module 5: Kubeflow Pipelines & Model Serving

What You'll Build

What You'll Learn

Part 1: Setup

Prerequisites

Install Kubeflow Pipelines

Step 1: Run Installation Script

Step 2: Wait for Pods

Step 3: Access Kubeflow UI

Set Up Python Environment

Step 1: Create Virtual Environment

Step 2: Install Dependencies

Step 3: Verify Installation

Verify Installation

Part 2: Understanding the Concepts

What is Kubeflow Pipelines?

The Problem: Manual ML Workflows

The Solution: Automated Pipelines

Key Concepts

1. Components

2. Pipelines

3. Artifacts

What is KServe?

The Model Serving Problem

The Solution: KServe

API Format

How They Work Together

Part 3: Hands-On Exercises

Exercise 1: Data Preparation Component

Step 1: Open the File

Step 2: Complete the TODOs

Exercise 2: Training & Evaluation

Part A: Training Component

Part B: Evaluation Component

Exercise 3: Pipeline Orchestration

Step 1: Open Pipeline File

Step 2: Complete the TODOs

Part 4: Model Deployment

Build Model Serving Container

Step 1: Review Serving Code

Step 2: Build Docker Image

Step 3: Tag for kind

Step 4: Load to kind Cluster

Deploy with KServe

Step 1: Apply RBAC Permissions

Step 2: Compile Pipeline with Deployment

Step 3: Upload and Run

Step 4: Verify Deployment

Call the REST API

Step 1: Port-Forward to Service

Step 2: Make a Prediction

Step 3: Test Different Users

Part 5: Integration & Advanced

JavaScript Client Example

Genre Filtering

Available Genres

Examples

Part 6: Troubleshooting

Common Issues & Solutions

Issue 1: Minio Pod CrashLoopBackOff

Issue 2: Python 3.13 - Cannot import setuptools

Issue 3: KServe 403 Forbidden

Issue 4: KServe Image Pull Error

Part 7: Reference

Commands Cheat Sheet

Kubeflow Management

Cleanup

What You've Accomplished

Comparison with Manual Workflows

Next Steps

Navigation

Quick Links

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!