Skip to content

feat: Kubernetes deployment support (Dockerfile, manifests, docs)#13

Open
chaosreload wants to merge 7 commits intozerobootdev:mainfrom
chaosreload:feat/kubernetes-deployment
Open

feat: Kubernetes deployment support (Dockerfile, manifests, docs)#13
chaosreload wants to merge 7 commits intozerobootdev:mainfrom
chaosreload:feat/kubernetes-deployment

Conversation

@chaosreload
Copy link

@chaosreload chaosreload commented Mar 24, 2026

Summary

Add first-class Kubernetes deployment support for Zeroboot, addressing all items in #9.

EKS validation: All manifests and scripts validated on EKS 1.31 / ap-southeast-1 / c8i.xlarge with nested virtualization.

Note: The serve bind address fix (127.0.0.1 → 0.0.0.0) required for K8s health probes is tracked separately in #14 and should be merged before or alongside this PR.

Changes

Dockerfile

Multi-stage build: Rust 1.86 compiler stage + Ubuntu 22.04 runtime. Firecracker binary bundled at build time. vmlinux and rootfs images are not baked into the image — they are mounted via PersistentVolume, keeping the image lean and allowing runtime upgrades without rebuilding.

docker/entrypoint.sh

  • Validates /dev/kvm access on startup (fast-fail with clear error message)
  • Creates the snapshot template on first boot (~15s), skips if already present on PVC
  • Supports Python and optional Node.js runtimes via env vars

deploy/k8s/

Reference manifests ready to apply:

  • namespace.yaml — dedicated zeroboot namespace
  • pvc.yaml — 20 Gi gp3 PVC for snapshot persistence (eliminates 15s re-snapshot on restart)
  • deployment.yamlprivileged + hostPath /dev/kvm (works on plain EKS without KubeVirt); podAntiAffinity to spread Pods across Nodes; tuned readiness/liveness probes (initialDelaySeconds: 120)
  • service.yaml — ClusterIP service
  • hpa.yaml — custom-metric HPA on zeroboot_concurrent_forks

deploy/eks/

EKS-specific deployment configs:

  • eks-cluster-only.yaml — create cluster without node groups (Step 1)
  • eks-self-managed-kvm.sh — end-to-end script for self-managed ASG with CpuOptions.NestedVirtualization=enabled; see EKS note below
  • eks-with-kvm-nodegroup.yaml — eksctl managed node group config (⚠️ see caveat)
  • eks-add-kvm-nodegroup.yaml — add KVM node group to existing cluster

docs/KUBERNETES.md

Comprehensive deployment guide covering:

  • EC2 instance selection: c8i/m8i/r8i recommended
  • EKS managed vs self-managed node groups — root cause and solution for CpuOptions being silently dropped
  • KVM device access (privileged + hostPath, with note on kubevirt alternative)
  • PVC storage guidance
  • Autoscaling, Prometheus, configuration reference, known limitations

⚠️ EKS Node Group: Use Self-Managed

TL;DR: EKS managed node groups silently drop CpuOptions.NestedVirtualization — your nodes start without /dev/kvm even if the eksctl YAML looks correct.

Root cause: When you provide a Launch Template to a managed node group, EKS generates a new internal LT and merges only a subset of fields. CpuOptions is not in that subset — despite not being listed in the official blocked-fields list. This is a documentation gap.

Solution: Use deploy/eks/eks-self-managed-kvm.sh, which creates an ASG + Launch Template directly via AWS CLI, bypassing EKS's internal LT generation entirely.

Architecture note

Kubernetes manages the lifecycle of the zeroboot server process — it does not schedule individual sandboxes. Each /v1/exec request is handled entirely within the Pod via a KVM fork (~0.8 ms). K8s role is capacity management: health checks, rolling updates, and horizontal scaling.

Testing

Validated on EKS 1.31 / ap-southeast-1 / c8i.xlarge (self-managed node group with nested virt):

  • /dev/kvm present on nodes
  • ✅ Docker image builds successfully (rust:1.86)
  • ✅ Template creation completes (~19s)
  • ✅ Readiness probe passes after template creation
  • /v1/health reachable from Service ClusterIP
  • CODE:print(1+1)2 (~100ms)
  • CODE:import numpy... → result (~265ms)
  • cat /etc/os-release → content (~28ms)

Depends on: #14

Closes #9

@chaosreload chaosreload mentioned this pull request Mar 24, 2026
4 tasks
chaosreload pushed a commit to chaosreload/zeroboot that referenced this pull request Mar 25, 2026
P0 fixes:
- serve: change default bind from 127.0.0.1 to 0.0.0.0 to fix K8s
  health probes and Service routing; add --bind flag for explicit control
- entrypoint.sh: pass $ZEROBOOT_BIND (default 0.0.0.0) to serve command

P1 fixes:
- deployment.yaml: replace devices.kubevirt.io/kvm (requires kubevirt)
  with privileged: true + hostPath /dev/kvm (works on plain EKS)
- deployment.yaml: increase livenessProbe initialDelaySeconds from 60 to 120;
  template creation takes ~19s, 60s was too tight on slow EBS attach
- deployment.yaml: add /dev/kvm hostPath volume and mount

EKS self-managed node group (new file):
- deploy/eks/eks-self-managed-kvm.sh: end-to-end script to create a
  self-managed ASG + Launch Template with CpuOptions.NestedVirtualization=enabled
  EKS managed node groups silently drop CpuOptions — self-managed bypasses this
- deploy/eks/eks-with-kvm-nodegroup.yaml: add warning about CpuOptions being
  dropped by managed node groups (documented as a gap vs AWS official docs)

Docs:
- docs/KUBERNETES.md: add EKS managed vs self-managed section with root cause
  analysis and the recommended self-managed approach
- docs/KUBERNETES.md: add server bind address configuration note
- docs/KUBERNETES.md: add ZEROBOOT_BIND env var reference

Validated on: EKS 1.31 / ap-southeast-1 / c8i.xlarge (nested virt)
Ref: chaosreload/zeroboot PR zerobootdev#13
chaosreload and others added 7 commits March 25, 2026 21:29
- Dockerfile: multi-stage build (Rust compiler + Ubuntu runtime)
  Firecracker bundled; vmlinux/rootfs mounted via PVC
- docker/entrypoint.sh: handles template creation on first boot,
  skips if snapshot already exists on PVC
- deploy/k8s/: namespace, PVC (gp3 20Gi), Deployment with KVM device
  plugin resource, podAntiAffinity, health probes, HPA, Service
- docs/KUBERNETES.md: EC2 instance family requirements, KVM device
  plugin setup, PVC storage guidance, autoscaling with custom metric
  (zeroboot_concurrent_forks), Karpenter NodePool example,
  ServiceMonitor config, configuration reference

Closes zerobootdev#9
c8i/m8i/r8i support nested virtualization on regular (non-metal) sizes
via --cpu-options NestedVirtualization=enabled. Other families (c6i, m6i
etc.) require .metal sizes for KVM access. Update instance table to make
this distinction explicit.
Three files covering two scenarios:
- eks-with-kvm-nodegroup.yaml: cluster + KVM node group in one shot
- eks-cluster-only.yaml: cluster only (no node groups)
- eks-add-kvm-nodegroup.yaml: add KVM node group to existing cluster

All configs use c8i.xlarge with cpuOptions.nestedVirtualization=enabled,
AmazonLinux2023 AMI, and aws-ebs-csi-driver addon for PVC support.
P0 fixes:
- serve: change default bind from 127.0.0.1 to 0.0.0.0 to fix K8s
  health probes and Service routing; add --bind flag for explicit control
- entrypoint.sh: pass $ZEROBOOT_BIND (default 0.0.0.0) to serve command

P1 fixes:
- deployment.yaml: replace devices.kubevirt.io/kvm (requires kubevirt)
  with privileged: true + hostPath /dev/kvm (works on plain EKS)
- deployment.yaml: increase livenessProbe initialDelaySeconds from 60 to 120;
  template creation takes ~19s, 60s was too tight on slow EBS attach
- deployment.yaml: add /dev/kvm hostPath volume and mount

EKS self-managed node group (new file):
- deploy/eks/eks-self-managed-kvm.sh: end-to-end script to create a
  self-managed ASG + Launch Template with CpuOptions.NestedVirtualization=enabled
  EKS managed node groups silently drop CpuOptions — self-managed bypasses this
- deploy/eks/eks-with-kvm-nodegroup.yaml: add warning about CpuOptions being
  dropped by managed node groups (documented as a gap vs AWS official docs)

Docs:
- docs/KUBERNETES.md: add EKS managed vs self-managed section with root cause
  analysis and the recommended self-managed approach
- docs/KUBERNETES.md: add server bind address configuration note
- docs/KUBERNETES.md: add ZEROBOOT_BIND env var reference

Validated on: EKS 1.31 / ap-southeast-1 / c8i.xlarge (nested virt)
Ref: chaosreload/zeroboot PR zerobootdev#13
src/main.rs and entrypoint.sh bind address changes belong in a dedicated
fix PR. This PR should only contain K8s deployment configs and docs.

The deployment.yaml already handles the 127.0.0.1 limitation via the
hostPath /dev/kvm approach; users can add a socat sidecar if needed
until the fix PR is merged.
@chaosreload chaosreload force-pushed the feat/kubernetes-deployment branch from a2898a1 to c41f21c Compare March 25, 2026 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

K8s deployment

1 participant