Skip to content

Module 0.md

Rabieh Fashwall edited this page Nov 27, 2025 · 1 revision

Module 0: Environment Setup

Overview

This workshop supports two setup options:

  • Option A: macOS Local Setup - For macOS users who want to run locally
  • Option B: GitHub Codespaces - Cloud-based, works on any platform

Choose the option that works best for you.


Option A: macOS Local Setup

Step 1: Install Homebrew (if not installed)

Homebrew is the package manager for macOS. If you don't have it:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Verify installation:

brew --version

Step 2: Install Python 3.11+

Python is the primary language for ML components.

Install via Homebrew:

brew install python@3.11

Verify installation:

python3 --version
# Expected: Python 3.11+ or newer

Step 3: Install Go 1.21+

Go is used for high-performance infrastructure services (API Gateway in Module 4).

Install via Homebrew:

brew install go

Verify installation:

go version
# Expected: go1.21+ or newer

Step 4: Install Docker Desktop

Docker is required for containerization and running local Kubernetes.

Installation:

  1. Download Docker Desktop for Mac
    • Intel Mac: Choose Intel chip version
    • Apple Silicon (M1/M2/M3/M4): Choose Apple chip version
  2. Drag Docker.app to Applications folder
  3. Launch Docker Desktop from Applications
  4. Grant permissions when prompted (keychain access, etc.)
  5. Wait for Docker to start (whale icon in menu bar)

Verify installation:

docker --version
docker run hello-world

Expected output:

Hello from Docker!
This message shows that your installation appears to be working correctly.

Performance Configuration:

For optimal performance during the workshop:

  1. Open Docker Desktop โ†’ Settings (gear icon)
  2. Go to Resources
  3. Allocate:
    • CPUs: At least 4 (8 if you have 16GB+ RAM)
    • Memory: At least 8GB (12GB if available)
    • Disk: At least 20GB
  4. Click "Apply & Restart"

Step 5: Install kubectl

kubectl is the Kubernetes command-line tool.

Install via Homebrew:

brew install kubectl

Verify installation:

kubectl version --client
# Expected: v1.28+ or newer

Step 6: Install kind

kind (Kubernetes in Docker) runs local Kubernetes clusters.

Install via Homebrew:

brew install kind

Verify installation:

kind version
# Expected: kind v0.20+ or newer

Step 7: Clone the Workshop Repository

git clone https://github.com/rfashwall/ml-con-workshop.git
cd ml-con-workshop

Step 8: Set Up Python Virtual Environment

Virtual environments isolate workshop dependencies from your system Python.

Create and activate virtual environment:

python3 -m venv venv
source venv/bin/activate

Verify activation: Your prompt should show (venv) prefix:

(venv) user@MacBook-Pro ml-con-workshop %

Install Python dependencies:

# Upgrade pip
pip install --upgrade pip

# Install all workshop dependencies
pip install -r requirements.txt

What gets installed:

  • MLflow 2.9+: Experiment tracking and model registry
  • BentoML 1.4+: Model packaging and serving
  • Transformers 4.35+: Hugging Face transformers
  • PyTorch 2.1+: Deep learning framework
  • Datasets: Hugging Face datasets library
  • scikit-learn: Traditional ML algorithms
  • FastAPI: Web framework for APIs
  • Pydantic: Data validation
  • Prometheus Client: Metrics collection
  • Kubeflow Pipelines SDK: ML workflow orchestration
  • pytest: Testing framework

Verify installation:

pip list | grep -E "mlflow|bentoml|transformers"

Expected output:

bentoml           1.4.x
mlflow            2.9.x
transformers      4.35.x

Step 9: Create Kubernetes Cluster

Create a local single-node Kubernetes cluster using kind.

kind create cluster --config modules/module-0/kind.yaml

Expected output:

Creating cluster "mlops-workshop" ...
 โœ“ Ensuring node image (kindest/node:v1.34.0)
 โœ“ Preparing nodes
 โœ“ Writing configuration
 โœ“ Starting control-plane
 โœ“ Installing CNI
 โœ“ Installing StorageClass
Set kubectl context to "kind-mlops-workshop"

Verify cluster:

# Check cluster info
kubectl cluster-info --context kind-mlops-workshop

# List nodes (should show 1 node)
kubectl get nodes

# Check all system pods are running
kubectl get pods -n kube-system

Expected node output:

NAME                            STATUS   ROLES           AGE   VERSION
mlops-workshop-control-plane    Ready    control-plane   2m    v1.34.0

Untain Node

kubectl taint nodes mlops-workshop-control-plane node-role.kubernetes.io/control-plane:NoSchedule-

Step 10: Verify Setup

Test that MLflow is working correctly:

mlflow ui

Expected output:

[INFO] Starting gunicorn 20.1.0
[INFO] Listening at: http://127.0.0.1:5000

Test in browser:

  1. Open http://localhost:5000
  2. You should see the MLflow tracking UI
  3. No experiments yet (this is expected)

Stop the server: Press Ctrl+C in the terminal.


macOS-Specific Troubleshooting

Apple Silicon (M1/M2/M3/M4) Considerations

If you encounter architecture issues:

Some Python packages may require Rosetta 2 for compatibility:

softwareupdate --install-rosetta

Docker and kind work natively on Apple Silicon, so you shouldn't need Rosetta for containerization.

Docker Not Starting

Symptoms:

  • docker ps returns "Cannot connect to the Docker daemon"
  • Docker Desktop shows errors

Solutions:

  1. Check Docker Desktop is running (whale icon in menu bar)
  2. Grant permissions in System Settings โ†’ Privacy & Security
  3. Restart Docker Desktop: Docker menu โ†’ Restart
  4. If still failing, quit Docker completely and relaunch

Port Already in Use

Error: "Address already in use :5000"

Find and kill the process using the port:

lsof -i :5000
kill -9 <PID>

Or use a different port:

mlflow ui --port 5001

kind Cluster Creation Fails

Error: "failed to create cluster"

# Delete existing cluster
kind delete cluster --name mlops-workshop

# Ensure Docker is running
docker ps

# Recreate cluster
kind create cluster --name mlops-workshop

Python Package Installation Issues

If PyTorch installation is very slow:

# Install CPU-only version (faster, smaller download)
pip install torch --index-url https://download.pytorch.org/whl/cpu

If you see SSL certificate errors:

# Update certificates
pip install --upgrade certifi

macOS Setup Complete!

Verification checklist:

  • Homebrew installed
  • Python 3.11+ installed
  • Go 1.21+ installed
  • Docker Desktop installed and running
  • kubectl installed
  • kind installed
  • Repository cloned
  • Virtual environment created and activated
  • All Python packages installed (mlflow, bentoml, transformers, torch)
  • kind cluster "mlops-workshop" created
  • kubectl get nodes shows 1 node in Ready status
  • MLflow UI starts at http://localhost:5000

Option B: GitHub Codespaces

What is GitHub Codespaces?

A cloud-based development environment that runs in your browser. No local installation required!

Benefits:

  • โœ… Zero setup - all tools pre-installed
  • โœ… Works on any device (Windows, Mac, Linux, even iPad!)
  • โœ… Consistent environment for everyone
  • โœ… Free tier: 60 hours/month (120 hours for Pro users)
  • โœ… 4-core CPU, 8GB RAM, 32GB storage

Perfect for:

  • Users who don't want to install tools locally
  • Windows or Linux users (since we only support macOS locally)
  • Participants with limited disk space or older hardware
  • Anyone who wants a ready-to-go environment

Quick Start

Step 1: Fork the Repository (if you haven't already)

  1. Go to github.com/rfashwall/ml-con-workshop
  2. Click the "Fork" button in the top-right corner
  3. Wait for the fork to complete (takes ~30 seconds)
  4. You'll be redirected to your fork: github.com/YOUR_USERNAME/ml-con-workshop

Step 2: Create a Codespace

  1. On your forked repository, click the green "Code" button
  2. Select the "Codespaces" tab
  3. Click "Create codespace on main"
  4. Wait 3-5 minutes for the environment to build

What happens during setup:

  • A virtual machine is provisioned for you
  • VS Code opens in your browser
  • All tools are automatically installed (Python, Go, Docker, kubectl, kind)
  • Python dependencies are installed
  • You're ready to start!

Step 3: You're Ready!

Once the codespace opens:

  • โœ… VS Code runs in your browser
  • โœ… Terminal is available at the bottom
  • โœ… All tools are pre-installed
  • โœ… Workshop repository is cloned

Verify your environment:

python --version  # Should show Python 3.11+
go version       # Should show Go 1.21+
docker --version # Should show Docker 24+
kubectl version --client
kind version

What's Pre-Installed

Your codespace includes everything needed for the workshop:

Tool Version Purpose
Python 3.11+ ML model development
Go 1.21+ Infrastructure services
Docker 24+ Containerization (Docker-in-Docker)
kubectl 1.28+ Kubernetes CLI
kind 0.20+ Local Kubernetes clusters
MLflow 2.9+ Experiment tracking
BentoML 1.4+ Model serving
Transformers 4.35+ NLP models
PyTorch 2.1+ Deep learning

All Python dependencies from requirements.txt are installed automatically!


Accessing Workshop Services

When you run services in the workshop, Codespaces automatically forwards ports.

Services you'll run:

  • MLflow UI (port 5000)
  • BentoML (port 3000)
  • Kubeflow (port 8080)
  • Prometheus (port 9090)
  • Grafana (port 3001)

How port forwarding works:

  1. Start a service in the terminal:

    mlflow ui
  2. Look for the "Ports" tab at the bottom of VS Code (next to Terminal)

  3. You'll see port 5000 listed with a status

  4. Click the globe icon (๐ŸŒ) next to the port to open in browser

  5. The URL will be: https://[your-codespace-name]-5000.app.github.dev

Making ports public:

By default, ports are private (only you can access). To share with others:

  1. Right-click the port in the Ports tab
  2. Select "Port Visibility" โ†’ "Public"

Using Your Codespace

File Editor

  • Click files in the sidebar to open them
  • Edit code just like VS Code desktop
  • Changes are auto-saved

Terminal

  • Access terminal at bottom of screen
  • Multiple terminals: Click "+" to create new ones
  • Run commands exactly as in local setup

Extensions

Pre-installed VS Code extensions:

  • Python extension
  • Go extension
  • Docker extension
  • Kubernetes extension

Persistence

  • Your codespace persists when you close the browser
  • Files and changes are saved
  • Returns to the same state when you reopen
  • Auto-deletes after 30 days of inactivity (free tier)

Manual Setup (if devcontainer fails)

If the automatic setup doesn't complete successfully:

Run the setup script manually:

bash .devcontainer/post-create.sh

Verify installation:

python --version  # Should show Python 3.11+
go version       # Should show Go 1.21+
docker --version # Should show Docker 24+
kubectl version --client
kind version

Install Python dependencies (if needed):

pip install -r requirements.txt

Codespaces Limitations & Considerations

Free Tier Limits

What you get:

  • 60 hours/month for free accounts
  • 120 hours/month for GitHub Pro/Team/Enterprise
  • 4-core CPU, 8GB RAM, 32GB storage

This workshop needs:

  • ~6 hours of active use
  • You can complete the workshop multiple times on the free tier!

Performance Considerations

What works great:

  • Running Python scripts
  • Training small/medium ML models
  • Building and running containers
  • Local Kubernetes with kind

What may be slower:

  • kind clusters (compared to local Docker)
  • Large Docker image builds
  • Training very large ML models
  • Network operations (downloading large datasets)

Typical performance:

  • kind cluster creation: ~3-5 minutes (vs 2-3 minutes local)
  • Python package installation: ~5-10 minutes (same as local)
  • MLflow UI: instant (same as local)

Storage Management

You have 32GB storage:

  • Workshop uses ~5-8GB
  • Docker images can grow quickly

Clean up regularly:

# Remove unused Docker resources
docker system prune -a

# Clean Python caches
find . -type d -name __pycache__ -exec rm -rf {} +
pip cache purge

Troubleshooting Codespaces

Issue: Ports not forwarding

Symptoms:

  • Can't access MLflow UI or other services
  • "Unable to connect" errors in browser

Solutions:

  1. Check the "Ports" tab at bottom of VS Code
  2. Verify the service is actually running:
    # Check if process is listening
    lsof -i :5000
  3. Right-click port โ†’ "Port Visibility" โ†’ "Public"
  4. Restart the service:
    # Stop with Ctrl+C, then restart
    mlflow ui
  5. If still failing, restart the codespace:
    • Click "Codespaces" menu (bottom-left)
    • Select "Restart Codespace"

Issue: Docker not working

Error: Cannot connect to Docker daemon

Solutions:

  1. Check Docker status:

    docker ps
  2. If Docker isn't running, the codespace may need restart:

    • Click "Codespaces" menu (bottom-left)
    • Select "Restart Codespace"
    • Wait 2-3 minutes for full restart
  3. Verify Docker-in-Docker is enabled:

    # Should see docker containers
    docker ps

Issue: Running out of storage

Symptoms:

  • "No space left on device" errors
  • Docker builds failing

Solutions:

# Check disk usage
df -h

# Clean up Docker (this can free 5-10GB!)
docker system prune -a

# Clean up Python caches
find . -type d -name __pycache__ -exec rm -rf {} +
pip cache purge

# Remove old kind clusters
kind get clusters
kind delete cluster --name old-cluster-name

Issue: Slow performance

Symptoms:

  • Commands taking very long
  • Browser feels sluggish
  • High latency

Solutions:

  1. Close unused browser tabs (codespaces use resources in your browser)

  2. Stop unnecessary services:

    # Stop MLflow if not needed
    pkill -f mlflow
    
    # Stop kind cluster if not actively using
    kind delete cluster --name mlops-workshop
  3. Upgrade to larger machine type:

    • Click "Codespaces" menu (bottom-left)
    • Select "Change machine type"
    • Options: 8-core (2x faster, ~$0.36/hour) or 16-core (4x faster, ~$0.72/hour)
    • Note: This costs more but can be worth it for heavy workloads
  4. Check your internet connection:

    • Codespaces requires stable internet
    • Try moving closer to WiFi router
    • Use wired connection if possible
  5. Try a different browser:

    • Chrome/Edge work best
    • Firefox can be slower
    • Safari may have compatibility issues

Issue: Terminal not responding

Solutions:

  1. Open a new terminal: Click "+" in terminal panel
  2. Restart codespace: Codespaces menu โ†’ "Restart Codespace"
  3. If completely frozen, close browser tab and reopen codespace

Issue: Python packages not installed

Symptoms:

  • ModuleNotFoundError when importing mlflow, bentoml, etc.

Solutions:

# Check if virtual environment is activated
which python
# Should show: /workspaces/ml-con-workshop/venv/bin/python

# Activate if needed
source venv/bin/activate

# Reinstall dependencies
pip install -r requirements.txt

# Verify installation
pip list | grep -E "mlflow|bentoml|transformers"

Issue: kubectl cannot connect to cluster

Solutions:

# Check if kind cluster exists
kind get clusters

# Create cluster if missing
kind create cluster --name mlops-workshop

# Set context
kubectl config use-context kind-mlops-workshop

# Verify
kubectl get nodes

Codespaces Best Practices

To maximize your free hours:

  1. Stop your codespace when not using:

    • Codespaces menu โ†’ "Stop Codespace"
    • Auto-stops after 30 minutes of inactivity (default)
    • Doesn't count against your hours when stopped
  2. Delete old codespaces:

  3. Monitor usage:

Performance tips:

  1. Use codespace-specific optimizations:

    • The .devcontainer configuration is optimized for codespaces
    • Don't modify Docker resources (managed automatically)
  2. Precompile when possible:

    • Cache Docker layers
    • Use prebuilt images from the workshop
  3. Clean up regularly:

    • Run docker system prune -a every few days
    • Remove old kind clusters you're not using

Codespaces vs Local Setup

Feature Codespaces macOS Local
Storage 32GB (cloud) Unlimited (your disk)
Performance Good (4-core) Depends on your Mac
Cost Free (60 hrs/month) Free (after initial setup)
Portability Access anywhere Only on your Mac
Offline Requires internet Works offline
Persistence 30 days inactive Forever

Choose Codespaces if:

  • You don't have a Mac
  • You want zero setup time
  • You work from multiple devices
  • You have limited disk space

Choose macOS Local if:

  • You want maximum performance
  • You want permanent local environment

Verification

macOS Users

Run this complete verification:

# Check all tools
python3 --version && \
go version && \
docker --version && \
kubectl version --client && \
kind version && \
kind get clusters

# Expected output:
# Python 3.11+
# go version go1.21+
# Docker version 24.0+
# Client Version: v1.34+
# kind v0.30+
# mlops-workshop

Test Python packages:

pip list | grep -E "mlflow|bentoml|transformers|torch"

# Expected: All packages listed with correct versions

Test Kubernetes:

kubectl get nodes

# Expected: 2 node in Ready status

Codespaces Users

Run this verification:

# Check all tools
python --version && \
go version && \
docker --version && \
kubectl version --client && \
kind version

# All should show correct versions

Test Python environment:

# Should show virtual environment path
which python

# Check packages are installed
pip list | grep -E "mlflow|bentoml|transformers"

Quick Reference

Essential Commands

Command Purpose
python --version (or python3) Check Python version
go version Check Go version
docker ps List running containers
kubectl get nodes List Kubernetes nodes
kind get clusters List kind clusters
source venv/bin/activate Activate virtual environment (macOS)
deactivate Deactivate virtual environment
pip list List installed packages
mlflow ui Start MLflow tracking server
docker system prune -a Clean up Docker (free disk space)

Next Steps

Once your environment is set up (either macOS local or Codespaces):

  1. โœ… Verify all tools are working using the verification section above
  2. โœ… Proceed to Module 1 to start training your first ML model!
  3. ๐Ÿ’ก Bookmark this page for reference throughout the workshop
  4. ๐Ÿ“– Review the Troubleshooting Guide if you encounter issues

Additional Resources

General Documentation

Workshop Tools

GitHub Codespaces


Navigation

Home Next
๐Ÿ  Home Module 1: Model Training & Experiment Tracking โ†’

Quick Links


MLOps Workshop | GitHub Repository

Clone this wiki locally