Skip to content

felipepenha/helm4genai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Helm4GenAI

Helm4GenAI is a boilerplate project designed to simplify the deployment of Generative AI applications on Kubernetes. It provides a structured foundation for spinning up local clusters (Kind) or deploying to production (GKE/Terraform), managing the GenAI stack (vLLM, MCP), and deploying applications using standard Kubernetes manifests.

Prerequisites

  • uv
  • Terraform
  • Helm
  • Kind
  • Podman
  • Google Cloud SDK (gcloud)

Installation Guide

macOS (via Homebrew)

brew tap hashicorp/tap
brew install hashicorp/tap/terraform helm kind podman

Quick Start

The project includes a Makefile to automate the infrastructure setup and application deployment.

⚠️ BAML Configuration

Before running or deploying the Robots agent, you can customize the LLM settings in examples/robots/baml_src/robots.baml. For example, you can define a client like this:

client LocalLLMClient {
  provider openai
  options {
    base_url env.VLLM_API_URL
    api_key env.VLLM_API_KEY
    model env.OLLAMA_MODEL
    temperature 0.0
    max_tokens 10000
  }
}
function AnalyzeRobotsTxt(content: string) -> RobotsSummary {
  client LocalLLMClient
  prompt #"

After making any changes to .baml files, you must regenerate the client code:

make generate-baml

This requires uv to be installed.

⚡️ Super Quick Start

To set up, deploy, and expose the app in one command:

make minimal

Then open http://localhost:8000.

To deploy the GenAI stack (vLLM, Langfuse, MCP) and the Robots Agent:

make robots

Then open http://localhost:7860.

Manual Steps

1. Initialize Infrastructure

To provision the local Kubernetes cluster (Kind) and install the platform stack (vLLM, MCP):

make up

2. Deploy Example Application

To deploy the minimal example application:

make deploy APP=minimal

To deploy the Robots Agent (GenAI Example):

make deploy APP=robots

3. Verification

You can verify the components using the following commands:

make verify-cluster       # Check Kind cluster status
make verify-app APP=minimal # Check minimal app status
make verify-app APP=robots  # Check robots app status

4. Access Application

To access the minimal application locally:

make serve APP=minimal

Then open http://localhost:8000.

To access the Robots Agent:

make serve APP=robots

Then open http://localhost:7860.

Tip

The serve-* commands (app, robots, langfuse) will block your terminal. Open a new terminal tab or window to run make langfuse while your application is running in another.

5. Cleanup

To destroy the cluster and resources:

make down

6. Monitoring & Debugging

The Makefile provides several helpers to check the status of your cluster and applications:

make status                   # Overview of Nodes, Pods, Services, and Helm releases
make watch                    # Watch pods in real-time
make events                   # Show recent cluster events
make logs APP=robots          # Follow logs for a specific app
make describe APP=robots      # Describe pods for troubleshooting
make debug-pod                # Launch an ephemeral pod with network tools (curl, dig, etc.)

Architecture

The project follows a flow where the Developer uses the Makefile to orchestrate Terraform, which in turn provisions the K8s cluster and installs KubeVela via Helm.

graph LR
    Developer -->|Runs| Makefile[Makefile]
    
    Makefile -->|Invokes| Terraform
    Terraform -->|Provisions| K8s[K8s Cluster]
    Terraform -->|Installs| Platform["Platform Stack (vLLM, MCP)"]
    
    K8s -.->|Hosts| Platform
    Developer -->|Deploys| App[Application]
    K8s -.->|Hosts| App
Loading

Project Structure

This project uses a modular Terraform architecture to separate local development from production configurations:

  • terraform/modules/platform: Contains the core logic (Platform stack) shared across environments.
  • terraform/environments/local: Configuration for running locally with Kind.
  • terraform/environments/prod: (Skeleton) Configuration for a production cloud environment.

GCP Deployment Guide

This guide details the steps to deploy the solution to Google Cloud Platform (GKE).

1. Prerequisites

Ensure you have the Google Cloud SDK installed and authenticated:

# Verify installation
gcloud --version

# Login to Google Cloud
gcloud auth login
gcloud auth application-default login

# Install auth plugin for kubectl
gcloud components install gke-gcloud-auth-plugin

2. Project Setup

  1. Create or select a Google Cloud Project.
  2. Enable the following APIs:
    • Compute Engine API
    • Kubernetes Engine API
    • Artifact Registry API
gcloud services enable compute.googleapis.com container.googleapis.com artifactregistry.googleapis.com --project your-gcp-project-id

To validate the production configuration:

cd terraform/environments/prod
terraform init
terraform validate

3. Infrastructure Provisioning (Terraform)

Navigate to the production environment directory and initialize Terraform:

cd terraform/environments/prod
terraform init

Create a terraform.tfvars file with your specific configuration (Project ID, Region, etc.):

project_id = "your-gcp-project-id"
region     = "us-central1"
zone       = "us-central1-a"
# Add other required variables

Apply the configuration to create the GKE cluster:

terraform apply

4. Deploy Application

After Terraform completes, configure kubectl to connect to your new GKE cluster:

gcloud container clusters get-credentials helm4genai-prod --region us-central1 --project your-gcp-project-id

Deploy the Robots application:

# Ensure you are in the root directory
cd ../../..
make deploy APP=robots

Note: For production, you will likely need to build and push container images to Google Artifact Registry (GAR) instead of loading them into Kind. The current make build-images target is optimized for local Kind development.

5. Verification

Check the status of your GKE nodes and pods:

kubectl get nodes
kubectl get pods -n genai

6. Cleanup

To destroy the GCP resources (and avoid costs):

make down-prod

Troubleshooting

Podman

If you are using Podman, the Makefile automatically sets KIND_EXPERIMENTAL_PROVIDER=podman for Terraform commands. Ensure you have initialized and started your podman machine (podman machine init, podman machine start).

About

Helm configuration for GenAI infrastructure via Terraform

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors