Helm4GenAI is a boilerplate project designed to simplify the deployment of Generative AI applications on Kubernetes. It provides a structured foundation for spinning up local clusters (Kind) or deploying to production (GKE/Terraform), managing the GenAI stack (vLLM, MCP), and deploying applications using standard Kubernetes manifests.
- uv
- Terraform
- Helm
- Kind
- Podman
- Google Cloud SDK (
gcloud)
brew tap hashicorp/tap
brew install hashicorp/tap/terraform helm kind podmanThe project includes a Makefile to automate the infrastructure setup and application deployment.
Before running or deploying the Robots agent, you can customize the LLM settings in examples/robots/baml_src/robots.baml. For example, you can define a client like this:
client LocalLLMClient {
provider openai
options {
base_url env.VLLM_API_URL
api_key env.VLLM_API_KEY
model env.OLLAMA_MODEL
temperature 0.0
max_tokens 10000
}
}
function AnalyzeRobotsTxt(content: string) -> RobotsSummary {
client LocalLLMClient
prompt #"
After making any changes to .baml files, you must regenerate the client code:
make generate-bamlThis requires uv to be installed.
To set up, deploy, and expose the app in one command:
make minimalThen open http://localhost:8000.
To deploy the GenAI stack (vLLM, Langfuse, MCP) and the Robots Agent:
make robotsThen open http://localhost:7860.
To provision the local Kubernetes cluster (Kind) and install the platform stack (vLLM, MCP):
make upTo deploy the minimal example application:
make deploy APP=minimalTo deploy the Robots Agent (GenAI Example):
make deploy APP=robotsYou can verify the components using the following commands:
make verify-cluster # Check Kind cluster status
make verify-app APP=minimal # Check minimal app status
make verify-app APP=robots # Check robots app statusTo access the minimal application locally:
make serve APP=minimalThen open http://localhost:8000.
To access the Robots Agent:
make serve APP=robotsThen open http://localhost:7860.
Tip
The serve-* commands (app, robots, langfuse) will block your terminal. Open a new terminal tab or window to run make langfuse while your application is running in another.
To destroy the cluster and resources:
make downThe Makefile provides several helpers to check the status of your cluster and applications:
make status # Overview of Nodes, Pods, Services, and Helm releases
make watch # Watch pods in real-time
make events # Show recent cluster events
make logs APP=robots # Follow logs for a specific app
make describe APP=robots # Describe pods for troubleshooting
make debug-pod # Launch an ephemeral pod with network tools (curl, dig, etc.)The project follows a flow where the Developer uses the Makefile to orchestrate Terraform, which in turn provisions the K8s cluster and installs KubeVela via Helm.
graph LR
Developer -->|Runs| Makefile[Makefile]
Makefile -->|Invokes| Terraform
Terraform -->|Provisions| K8s[K8s Cluster]
Terraform -->|Installs| Platform["Platform Stack (vLLM, MCP)"]
K8s -.->|Hosts| Platform
Developer -->|Deploys| App[Application]
K8s -.->|Hosts| App
This project uses a modular Terraform architecture to separate local development from production configurations:
terraform/modules/platform: Contains the core logic (Platform stack) shared across environments.terraform/environments/local: Configuration for running locally with Kind.terraform/environments/prod: (Skeleton) Configuration for a production cloud environment.
This guide details the steps to deploy the solution to Google Cloud Platform (GKE).
Ensure you have the Google Cloud SDK installed and authenticated:
# Verify installation
gcloud --version
# Login to Google Cloud
gcloud auth login
gcloud auth application-default login
# Install auth plugin for kubectl
gcloud components install gke-gcloud-auth-plugin- Create or select a Google Cloud Project.
- Enable the following APIs:
- Compute Engine API
- Kubernetes Engine API
- Artifact Registry API
gcloud services enable compute.googleapis.com container.googleapis.com artifactregistry.googleapis.com --project your-gcp-project-idTo validate the production configuration:
cd terraform/environments/prod
terraform init
terraform validateNavigate to the production environment directory and initialize Terraform:
cd terraform/environments/prod
terraform initCreate a terraform.tfvars file with your specific configuration (Project ID, Region, etc.):
project_id = "your-gcp-project-id"
region = "us-central1"
zone = "us-central1-a"
# Add other required variablesApply the configuration to create the GKE cluster:
terraform applyAfter Terraform completes, configure kubectl to connect to your new GKE cluster:
gcloud container clusters get-credentials helm4genai-prod --region us-central1 --project your-gcp-project-idDeploy the Robots application:
# Ensure you are in the root directory
cd ../../..
make deploy APP=robotsNote: For production, you will likely need to build and push container images to Google Artifact Registry (GAR) instead of loading them into Kind. The current make build-images target is optimized for local Kind development.
Check the status of your GKE nodes and pods:
kubectl get nodes
kubectl get pods -n genaiTo destroy the GCP resources (and avoid costs):
make down-prodIf you are using Podman, the Makefile automatically sets KIND_EXPERIMENTAL_PROVIDER=podman for Terraform commands. Ensure you have initialized and started your podman machine (podman machine init, podman machine start).