Skip to content

Module 5.md

Rabieh Fashwall edited this page Nov 27, 2025 · 1 revision

Module 5: Kubeflow Pipelines & Model Serving

What You'll Build

By the end of this module, you will have:

  • ✅ Running Kubeflow Pipelines on local Kubernetes
  • ✅ Working ML pipeline (data prep → train → evaluate → deploy)
  • ✅ Model deployed as REST API with KServe
  • ✅ Understanding of production ML orchestration

What You'll Learn

  • What Kubeflow Pipelines and KServe are and when to use them
  • How to build reusable pipeline components
  • How to create end-to-end ML workflows
  • How to deploy models as scalable APIs
  • How to integrate deployed models into applications

Part 1: Setup

Prerequisites

Before starting, ensure you have:

  • Docker Desktop installed and running

    docker --version
    docker ps  # Should connect without error
  • kubectl (Kubernetes CLI)

    kubectl version --client
  • kind (Kubernetes in Docker)

    kind version
  • Python 3.9-3.13

    python3 --version
  • 8GB+ RAM available for the Kubernetes cluster

    • Check Docker Desktop > Settings > Resources

Install Kubeflow Pipelines

Step 1: Run Installation Script

cd modules/module-5

# Run automated installation
./scripts/install-kubeflow.sh

What this does:

  1. Creates kind cluster named mlops-workshop (if not exists)
  2. Installs cert-manager (required for Kubeflow)
  3. Installs Kubeflow Pipelines v2.14.3
  4. Patches minio with compatible image
  5. Waits for all components to start

Expected output:

✓ kind cluster created/verified
✓ Kubeflow Pipelines installed
✓ Waiting for pods to be ready...
✓ Installation complete!

Next steps:
  1. kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80
  2. Open http://localhost:8080

Step 2: Wait for Pods

Installation takes 5-10 minutes on first run. Monitor progress:

# Watch pod status (Ctrl+C to exit when all Running)
kubectl get pods -n kubeflow -w

All pods should show Running with 1/1 or 2/2 READY:

NAME                                      READY   STATUS    RESTARTS   AGE
cache-server-xxx                          2/2     Running   0          3m
metadata-envoy-deployment-xxx             1/1     Running   0          3m
metadata-grpc-deployment-xxx              2/2     Running   0          3m
minio-xxx                                 2/2     Running   0          3m
ml-pipeline-xxx                           2/2     Running   0          3m
ml-pipeline-ui-xxx                        2/2     Running   0          3m
mysql-xxx                                 2/2     Running   0          3m
workflow-controller-xxx                   2/2     Running   0          3m

Step 3: Access Kubeflow UI

# In a separate terminal, keep this running
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

Open browser: http://localhost:8080

You should see the Kubeflow Pipelines dashboard.


Set Up Python Environment

Step 1: Create Virtual Environment

cd modules/module-5

# Create virtual environment
python3 -m venv venv

# Activate it
source venv/bin/activate  # macOS/Linux

Step 2: Install Dependencies

# Navigate to starter directory
cd starter

# Install all required packages
pip install -r requirements.txt

What gets installed:

  • kfp==2.14.3 - Kubeflow Pipelines SDK
  • pandas, numpy, scikit-learn - ML libraries
  • setuptools, wheel - Build tools (for Python 3.13+)

Step 3: Verify Installation

# Check kfp version
pip show kfp
# Should show: Version: 2.14.3

# Test imports
python3 -c "from kfp.dsl import component, pipeline; print('✓ KFP imports working')"

Verify Installation

Run the verification script:

cd modules/module-5
./scripts/verify-installation.sh

Expected output:

✓ kubectl installed
✓ kind cluster running
✓ kubeflow namespace exists
✓ All pods running
✓ ml-pipeline-ui service available
✓ All checks passed!

If any checks fail, see Troubleshooting


Part 2: Understanding the Concepts

What is Kubeflow Pipelines?

The Problem: Manual ML Workflows

Traditional ML workflow:

python prepare_data.py
python train_model.py --data=./data/train.csv
python evaluate.py --model=./models/model.pkl
kubectl apply -f deployment.yaml  # if model is good

Problems:

  • ❌ Not reproducible (hard to rerun exactly)
  • ❌ No tracking (which data? which code?)
  • ❌ Manual (human runs each step)
  • ❌ No dependency management (what if step fails?)
  • ❌ Hard to share with team

The Solution: Automated Pipelines

Kubeflow Pipelines turns your ML workflow into an automated, reproducible graph:

┌─────────────┐
│ Data Prep   │ → train_data, test_data
└──────┬──────┘
       ↓
┌─────────────┐
│ Train Model │ → trained_model
└──────┬──────┘
       ↓
┌─────────────┐    ┌─────────────┐
│ Evaluate    │←───┤ test_data   │
└──────┬──────┘    └─────────────┘
       ↓
┌─────────────┐
│ Deploy      │ → REST API
│ (KServe)    │
└─────────────┘

Benefits:

  • Reproducible - Same code + data + parameters = same results
  • Automated - One click to run entire workflow
  • Tracked - All inputs, outputs, and metrics logged
  • Scalable - Runs on Kubernetes, auto-scales
  • Shareable - Export as YAML, anyone can run

Key Concepts

1. Components

Components are self-contained code blocks that run in isolated containers:

from kfp.dsl import component, Output, Dataset

@component(
    base_image="python:3.11-slim",
    packages_to_install=["pandas==2.0.3"]
)
def prepare_data(output_data: Output[Dataset]):
    """Download and prepare data"""
    import pandas as pd

    data = pd.read_csv("https://example.com/data.csv")
    data.to_csv(output_data.path, index=False)

Key features:

  • Runs in own container (isolated)
  • Declares dependencies (pandas)
  • Typed outputs (Dataset)
  • Reusable across pipelines

2. Pipelines

Pipelines connect components into a workflow (DAG):

from kfp.dsl import pipeline

@pipeline(name="my-ml-pipeline")
def my_pipeline():
    # Step 1: Prepare data
    prep_task = prepare_data()

    # Step 2: Train (uses step 1 output)
    train_task = train_model(
        data=prep_task.outputs["output_data"]
    )

    # Ensure order
    train_task.after(prep_task)

Automatic features:

  • Runs steps in correct order
  • Passes data between steps
  • Tracks all inputs/outputs
  • Handles failures

3. Artifacts

Artifacts are typed data passed between components:

Type Purpose Example
Dataset Training/test data CSV files
Model Trained models Pickle, SavedModel
Metrics Performance metrics Accuracy, RMSE

How it works:

def train_model(
    train_data: Input[Dataset],  # Read from previous step
    model: Output[Model]          # Write for next step
):
    df = pd.read_csv(train_data.path)
    trained = fit(df)
    pickle.dump(trained, open(model.path, 'wb'))
  • Kubeflow stores artifacts in Minio (S3-compatible storage)
  • Components read/write using .path
  • Automatic lineage tracking

What is KServe?

The Model Serving Problem

After training a model, you need to:

  • Create HTTP server for predictions
  • Handle scaling (0 to many replicas)
  • Manage deployments (blue/green, canary)
  • Monitor performance

Doing this manually is complex!

The Solution: KServe

KServe is a Kubernetes-native platform that turns your model into a production REST API:

Your Model (model.pkl)
       ↓ Deploy
┌──────────────────────────────┐
│ KServe InferenceService     │
│                              │
│  HTTP: /v1/models/NAME:predict
│                              │
│  Auto-scaling: 0 → many pods │
│  Monitoring: metrics, logs   │
└──────────────────────────────┘
       ↓
Your App calls API

What you get:

  • Standard API - All models use same format
  • Auto-scaling - Scale to zero (save $), scale up on traffic
  • Canary deployments - Test new versions with % of traffic
  • Monitoring - Request logs, latency, errors

API Format

All KServe models use this standard format:

Request:

POST /v1/models/MODEL_NAME:predict
{
  "instances": [
    {"user_id": 1, "n_recommendations": 5}
  ]
}

Response:

{
  "predictions": [
    {
      "user_id": 1,
      "recommendations": [
        {"movie_id": 50, "movie_name": "Star Wars", "score": 0.89}
      ]
    }
  ]
}

How They Work Together

┌────────────────────────────────────────────┐
│       Kubeflow Pipeline                    │
│                                            │
│  Data Prep → Train → Evaluate → Deploy    │
│                          ↓                 │
│                     model.pkl              │
└──────────────────────┬─────────────────────┘
                       │ Creates
                       ↓
┌────────────────────────────────────────────┐
│       KServe InferenceService              │
│                                            │
│  REST API: http://service:8080/predict    │
│                                            │
│  Your App → [Request] → [Model] → [Response]
└────────────────────────────────────────────┘

Complete workflow:

  1. Kubeflow Pipeline trains and evaluates model
  2. Deploy component creates KServe InferenceService
  3. KServe starts serving model as HTTP API
  4. Your application calls API for predictions

Part 3: Hands-On Exercises

Exercise 1: Data Preparation Component

Goal: Build a component to download and prepare the MovieLens dataset.

Step 1: Open the File

cd modules/module-5/starter

# Open in your editor
code components/data_prep.py  # VS Code
# OR
nano components/data_prep.py  # Terminal editor
# OR
open components/data_prep.py  # macOS default

Step 2: Complete the TODOs

You need to implement:

  1. @component decorator - Configure base image and packages
  2. Download dataset - Use urllib.request.urlretrieve()
  3. Load data - Read with pd.read_csv()
  4. Train/test split - Use train_test_split()
  5. Save outputs - Write to train_data.path and test_data.path
  6. Movie metadata - Extract genres from binary columns

Key concepts to use:

  • Output[Dataset] for outputs
  • .path attribute to get file path
  • pd.read_csv() and df.to_csv()

Stuck? Look at solution/components/data_prep_solution.py


Exercise 2: Training & Evaluation

Goal: Create components for training a collaborative filtering model and evaluating it.

Part A: Training Component

cd modules/module-5/starter
open components/train.py

What you'll implement:

  • Load training data using Input[Dataset]
  • Create user-movie matrix with numpy
  • Train TruncatedSVD model
  • Calculate training RMSE
  • Log metrics with metrics.log_metric()
  • Save model with pickle to model.path

Key concepts:

  • Input[Dataset] for inputs, Output[Model] for model
  • Input[Metrics] for logging to Kubeflow
  • Matrix factorization with SVD

Part B: Evaluation Component

open components/evaluate.py

What you'll implement:

  • Load test data and model
  • Calculate predictions
  • Compute RMSE and MAE
  • Log evaluation metrics

Stuck? Look at solution/components/train_solution.py and evaluate_solution.py


Exercise 3: Pipeline Orchestration

Goal: Connect all components into an end-to-end pipeline.

Step 1: Open Pipeline File

cd modules/module-5/starter
open recommendation_pipeline.py

Step 2: Complete the TODOs

You'll create a pipeline that:

  1. Runs data preparation - Call prepare_data()
  2. Trains model - Call train_model() with outputs from step 1
  3. Evaluates model - Call evaluate_model() with test data and model
  4. Connects components - Pass outputs as inputs
  5. Sets dependencies - Use .after() to control order

Pipeline structure:

@pipeline(name="movie-recommendation-pipeline")
def recommendation_pipeline():
    # Step 1
    data_task = prepare_data()

    # Step 2 (uses data from step 1)
    train_task = train_model(
        train_data=data_task.outputs["train_data"]
    )

    # Step 3 (uses test data and model)
    eval_task = evaluate_model(
        test_data=data_task.outputs["test_data"],
        model=train_task.outputs["model"]
    )

Part 4: Model Deployment

Build Model Serving Container

Before deploying with KServe, we need to create a Docker image that can serve predictions.

Step 1: Review Serving Code

cd modules/module-5/model
ls

Files:

  • Dockerfile - Container definition
  • serve.py - Flask server implementing KServe v2 protocol
  • recommender.py - Movie recommendation logic
  • requirements.txt - Python dependencies

How it works:

# serve.py
@app.route("/v1/models/recommender:predict", methods=["POST"])
def predict():
    request_data = request.get_json()
    user_id = request_data["instances"][0]["user_id"]

    # Load model from minio and predict
    recommendations = model.recommend_movies(user_id)

    return jsonify({"predictions": [recommendations]})

Step 2: Build Docker Image

cd modules/module-5/model

# Build image
docker build -t movie-recommender:latest .

What this does:

  • Uses Python 3.11 slim base image
  • Installs pandas, scikit-learn, numpy, boto3
  • Copies serve.py and recommender.py
  • Exposes port 8080
  • Starts Flask server

Step 3: Tag for kind

IMPORTANT: Knative Serving (used by KServe) tries to resolve image digests from Docker Hub. For local images, we need to use a special prefix:

# Tag with kind.local prefix (bypasses digest resolution)
docker tag movie-recommender:latest kind.local/movie-recommender:latest

Why kind.local?

  • Knative config has registries-skipping-tag-resolving: "kind.local,ko.local,dev.local"
  • Images with these prefixes skip digest resolution
  • Allows local images to work without pushing to registry

Step 4: Load to kind Cluster

# Load image into kind cluster
kind load docker-image kind.local/movie-recommender:latest --name mlops-workshop

# Verify image is loaded
docker exec mlops-workshop-control-plane crictl images | grep movie-recommender

Expected output:

kind.local/movie-recommender   latest   abc123def456   50MB

Deploy with KServe

Step 1: Apply RBAC Permissions

KServe needs permissions to create InferenceServices:

cd modules/module-5

# Apply RBAC fix
kubectl apply -f kserve/rbac-fix.yaml

Verify permissions:

kubectl auth can-i create inferenceservices \
  --as=system:serviceaccount:kubeflow:pipeline-runner \
  -n default

Should return: yes

Step 2: Compile Pipeline with Deployment

cd modules/module-5/solution

# Compile solution pipeline (includes deploy component)
python3 recommendation_pipeline_solution.py --output pipeline_with_deploy.yaml

Step 3: Upload and Run

  1. Upload pipeline_with_deploy.yaml to Kubeflow UI
  2. Create run with parameters:
    • deploy_model_flag: True (enable deployment)
    • Other parameters: use defaults
  3. Start the run

Pipeline will:

  1. Prepare data (5 min)
  2. Train model (3 min)
  3. Evaluate model (2 min)
  4. Deploy to KServe (2 min)

Step 4: Verify Deployment

# Check InferenceService
kubectl get inferenceservice -n default

# Expected output:
# NAME                URL                                    READY
# movie-recommender   http://movie-recommender.default...    True

Check predictor pods:

kubectl get pods -n default | grep movie-recommender

# Expected: 1-2 predictor pods running

View logs:

# Get pod name
kubectl get pods -n default -l serving.kserve.io/inferenceservice=movie-recommender

# View logs
kubectl logs -n default <pod-name>

Call the REST API

Step 1: Port-Forward to Service

# In a separate terminal
kubectl port-forward -n default pod/movie-recommender-predictor 8080:80

Step 2: Make a Prediction

# Basic prediction
curl -X POST http://localhost:8080/v1/models/recommender:predict \
  -H "Content-Type: application/json" \
  -d '{
    "instances": [
      {"user_id": 1, "n_recommendations": 5}
    ]
  }'

Expected response:

{
  "predictions": [
    {
      "user_id": 1,
      "recommendations": [
        {
          "movie_id": 50,
          "movie_name": "Star Wars (1977)",
          "score": 0.89,
          "genres": ["Action", "Adventure", "Sci-Fi"]
        },
        {
          "movie_id": 181,
          "movie_name": "Return of the Jedi (1983)",
          "score": 0.87,
          "genres": ["Action", "Adventure", "Sci-Fi"]
        }
      ]
    }
  ]
}

If you only see movie_id (no names/genres), see Re-running Pipeline

Step 3: Test Different Users

# Try different user IDs
curl -X POST http://localhost:8080/v1/models/recommender:predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [{"user_id": 42, "n_recommendations": 3}]}'

Part 5: Integration & Advanced

JavaScript Client Example

class MovieRecommenderClient {
  constructor(baseUrl = 'http://localhost:8080') {
    this.baseUrl = baseUrl;
    this.predictUrl = `${baseUrl}/v1/models/recommender:predict`;
  }

  async getRecommendations(userId, nRecommendations = 10, genre = null) {
    const payload = {
      instances: [
        {
          user_id: userId,
          n_recommendations: nRecommendations
        }
      ]
    };

    // Add genre filter if specified
    if (genre) {
      payload.instances[0].genre = genre;
    }

    const response = await fetch(this.predictUrl, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json'
      },
      body: JSON.stringify(payload)
    });

    if (!response.ok) {
      throw new Error(`HTTP error! status: ${response.status}`);
    }

    const data = await response.json();
    return data.predictions[0];
  }
}

// Usage
const client = new MovieRecommenderClient();

client.getRecommendations(1, 5)
  .then(result => {
    result.recommendations.forEach(rec => {
      console.log(`${rec.movie_name} (${rec.score.toFixed(2)})`);
      console.log(`  Genres: ${rec.genres.join(', ')}`);
    });
  })
  .catch(error => console.error('Error:', error));

Genre Filtering

The service supports filtering recommendations by genre:

Available Genres

  • Action
  • Adventure
  • Animation
  • Children
  • Comedy
  • Crime
  • Documentary
  • Drama
  • Fantasy
  • Film-Noir
  • Horror
  • Musical
  • Mystery
  • Romance
  • Sci-Fi
  • Thriller
  • War
  • Western

Examples

Get only Action movies:

curl -X POST http://localhost:8080/v1/models/recommender:predict \
  -H "Content-Type: application/json" \
  -d '{
    "instances": [
      {
        "user_id": 1,
        "n_recommendations": 5,
        "genre": "Action"
      }
    ]
  }'

Get Comedy movies:

curl -X POST http://localhost:8080/v1/models/recommender:predict \
  -H "Content-Type: application/json" \
  -d '{
    "instances": [
      {
        "user_id": 1,
        "n_recommendations": 5,
        "genre": "Comedy"
      }
    ]
  }'

How it works:

  • Case-insensitive partial matching
  • "sci" matches "Sci-Fi"
  • "action" matches "Action"
  • Only returns movies with matching genre

Part 6: Troubleshooting

Common Issues & Solutions

Issue 1: Minio Pod CrashLoopBackOff

Error:

minio-xxx   0/2   CrashLoopBackOff

Cause: Incompatible minio image (especially on ARM/Apple Silicon)

Fix:

# Patch with compatible image
kubectl set image deployment/minio -n kubeflow \
  minio=minio/minio:RELEASE.2025-09-07T16-13-09Z-cpuv1

# Wait for rollout
kubectl rollout status deployment/minio -n kubeflow

# Verify
kubectl get pods -n kubeflow | grep minio
# Should show: minio-xxx   2/2   Running

Issue 2: Python 3.13 - Cannot import setuptools

Error:

Cannot import 'setuptools.build_meta'

Cause: Python 3.13 doesn't include setuptools by default

Fix:

# Install setuptools first
pip install --upgrade pip setuptools wheel

# Then install requirements
pip install -r requirements.txt

# Verify
pip show setuptools  # Should show 65.0.0+

Issue 3: KServe 403 Forbidden

Error:

ApiException: (403) Forbidden
User 'system:serviceaccount:kubeflow:pipeline-runner' cannot create inferenceservices

Cause: Missing RBAC permissions

Fix:

# Apply RBAC
kubectl apply -f kserve/rbac-fix.yaml

# Verify
kubectl auth can-i create inferenceservices \
  --as=system:serviceaccount:kubeflow:pipeline-runner \
  -n default
# Should return: yes

Issue 4: KServe Image Pull Error

Error:

Unable to fetch image "movie-recommender:latest"
failed to resolve image to digest: 401 Unauthorized

Cause: Knative tries to pull from Docker Hub, but image is local

Fix:

# 1. Tag with kind.local prefix
docker tag movie-recommender:latest kind.local/movie-recommender:latest

# 2. Load to kind
kind load docker-image kind.local/movie-recommender:latest --name mlops-workshop

# 3. Verify loaded
docker exec mlops-workshop-control-plane crictl images | grep movie-recommender

# 4. Update InferenceService
kubectl patch inferenceservice movie-recommender -n default --type='json' \
  -p='[{"op": "replace", "path": "/spec/predictor/containers/0/image", "value": "kind.local/movie-recommender:latest"}]'

# OR delete and recreate
kubectl delete inferenceservice movie-recommender -n default
# Then re-run pipeline with deploy_model_flag=True

Part 7: Reference

Commands Cheat Sheet

Kubeflow Management

# Access UI
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

# View pods
kubectl get pods -n kubeflow

# View pipeline runs
kubectl get pipelineruns -n kubeflow

# View component logs
kubectl logs <pod-name> -n kubeflow

# Restart Kubeflow
kubectl rollout restart deployment -n kubeflow

Cleanup

# Delete pipeline run
kubectl delete pipelinerun <run-name> -n kubeflow

# Delete InferenceService
kubectl delete inferenceservice movie-recommender -n default

# Uninstall Kubeflow
kubectl delete namespace kubeflow
kubectl delete namespace cert-manager

# Delete kind cluster
kind delete cluster --name mlops-workshop

What You've Accomplished

Automated ML Workflows - End-to-end pipelines in code

Component Reusability - Build once, use everywhere

Artifact Tracking - Automatic versioning and lineage

Production Deployment - Models as REST APIs with KServe

Enterprise Orchestration - Kubernetes-native ML platform

Comparison with Manual Workflows

Feature Manual With Kubeflow
Workflow Manual steps Automated pipeline
Tracking None All artifacts logged
Reproducibility Hard One-click rerun
Deployment Manual kubectl Automated with KServe
Collaboration Hard to share Share YAML file
Visibility Logs only Visual graph + metrics

Next Steps

Extend your pipeline:

  1. Add hyperparameter tuning component (GridSearchCV)
  2. Implement A/B testing with canary deployments (10% traffic)
  3. Add data validation (Great Expectations)
  4. Create model monitoring dashboard (Prometheus + Grafana)

Production considerations:

  1. Set up persistent artifact storage (S3, GCS)
  2. Configure resource limits and auto-scaling
  3. Implement CI/CD for pipeline updates (GitHub Actions)
  4. Add authentication and authorization

Continue learning:


Navigation

Previous Home Next
Module 4: API Gateway & Polyglot Architecture 🏠 Home Module 6: Monitoring & Observability

Quick Links


MLOps Workshop | GitHub Repository

Congratulations! 🎉

You've mastered enterprise ML workflow orchestration with Kubeflow Pipelines and model serving with KServe!

Clone this wiki locally