Skip to content

feat(storagebox): add Gateway API routing and replace MinIO with Garage#111

Open
adamancini wants to merge 4 commits intomainfrom
adamancini/gateway-api
Open

feat(storagebox): add Gateway API routing and replace MinIO with Garage#111
adamancini wants to merge 4 commits intomainfrom
adamancini/gateway-api

Conversation

@adamancini
Copy link
Member

@adamancini adamancini commented Feb 26, 2026

Summary

Replace ingress-nginx with Envoy Gateway as the Gateway API controller and replace MinIO with Garage for S3-compatible object storage.

Gateway API (Envoy Gateway)

Each application gets its own Gateway resource. Envoy Gateway provisions an independent Envoy proxy Deployment + NodePort Service per Gateway, providing full isolation.

Application Protocol Route Type Port
Garage S3 HTTP HTTPRoute 3900
PostgreSQL TCP TCPRoute 5432
Cassandra TCP TCPRoute 9042
rqlite HTTP HTTPRoute 4001
NFS NodePort (no Gateway API UDP support) multiple
  • Shared GatewayClass + EnvoyProxy resource configures NodePort for EC environments
  • Per-service gateway enable/disable toggles in KOTS admin console, nested under each service's settings group
  • Per-service TLS termination config for HTTP gateways (Garage, rqlite)
  • Envoy Gateway installed as EC extension via OCI chart (oci://docker.io/envoyproxy/gateway-helm v1.7.0), bundles all Gateway API CRDs including experimental TCPRoute

Garage S3 Storage (replaces MinIO)

  • Vendored subchart based on akkoma-helm's Garage implementation, upgraded to v1.3.1
  • Single StatefulSet, no operator dependency (removes MinIO operator from EC extensions)
  • Init container copies secrets to emptyDir with chmod 0600 (Kubernetes fsGroup adds group-read bits to secret volume mounts, but Garage requires exactly mode 0600)
  • Post-install/post-upgrade Helm hook Job using alpine:3.21 + curl + jq for Garage admin API calls:
    • Assigns cluster layout (1 GiB capacity)
    • Creates S3 access key and bucket
    • Stores credentials in a Kubernetes Secret
  • Helm test validates admin API health, S3 connectivity, bucket existence, and credentials Secret

Operational improvements

  • Support bundle: collectors and deploymentStatus/statefulsetStatus analyzers for all infrastructure (cert-manager, CNPG, Envoy Gateway, K8ssandra, cass-operator) and application components
  • Status informers: infrastructure deployments (cert-manager, cloudnative-pg, envoy-gateway, k8ssandra-operator, cass-operator) alongside conditional app informers
  • Builder key: static values enabling all components for air-gap image discovery via helm template
  • Preflights: NFS kernel module check upgraded from warn to fail
  • Images: consolidated all utility images to alpine:3.21 (removed busybox)
  • Helm timeout: 10m via helmUpgradeFlags
  • Makefile: vm-kubectl target for remote kubectl on EC VMs; removed minio-operator from test-install-operators
  • CI: updated workflow and smoke tests for Garage; gateway disabled in CI test values (no Envoy Gateway on CI clusters)

Patterns doc

New patterns/gateway-api/README.md covering per-application Gateway pattern, HTTPRoute/TCPRoute examples, EnvoyProxy/GatewayClass infrastructure, TLS termination, and KOTS integration. Notes TCPRoute experimental status is point-in-time.

Test plan

  • helm lint passes
  • helm template renders all resources with all components enabled
  • make validate-config four-way contract passes
  • EC headless install on CMX VM (v0.26.8)
  • Garage StatefulSet starts with correct secret permissions (init container)
  • Garage setup Job completes: layout assigned, bucket created, credentials stored
  • All infrastructure pods healthy (cert-manager, CNPG, Envoy Gateway, K8ssandra)
  • All application pods healthy (Garage, PostgreSQL, Cassandra, rqlite)
  • Envoy Gateway provisions per-application proxy pods
  • CI helm-install-test (pending with Garage v1.3.1 fixes)

Replace ingress-nginx with Envoy Gateway as the Gateway API controller,
installed as an EC extension via OCI chart. Each application gets its own
Gateway resource with an independent Envoy proxy instance:

- Garage S3: HTTP Gateway + HTTPRoute (port 3900)
- PostgreSQL: TCP Gateway + TCPRoute (port 5432)
- Cassandra: TCP Gateway + TCPRoute (port 9042)
- rqlite: HTTP Gateway + HTTPRoute (port 4001)
- NFS: stays on NodePort (Gateway API does not support UDP)

Replace MinIO operator + Tenant subchart with Garage v1.3.1, a
lightweight S3-compatible object storage that runs as a single
StatefulSet with no operator dependency. A post-install/post-upgrade
Helm hook Job handles cluster layout assignment, bucket creation, and
S3 credential provisioning via the Garage admin API. An init container
copies secrets to an emptyDir with mode 0600 to satisfy Garage's strict
file permission requirements.

Also includes:
- Per-service gateway and TLS settings in KOTS admin console config
- Helm test for Garage connectivity and S3 round-trip verification
- Support bundle collectors and deployment health analyzers for all
  infrastructure (cert-manager, CNPG, Envoy Gateway, K8ssandra)
- Status informers for infrastructure deployments
- Builder key for air-gap image discovery
- NFS kernel module preflight upgraded to hard fail
- Consolidated all utility images to alpine:3.21 (removed busybox)
- vm-kubectl Makefile target for remote kubectl on EC VMs
- Updated CI workflow and smoke tests for Garage
Covers per-application Gateway pattern with Envoy Gateway, HTTPRoute
for S3/HTTP services, TCPRoute for databases, GatewayClass/EnvoyProxy
infrastructure, TLS termination, and KOTS config integration. All
examples drawn from the storagebox application. Notes that TCPRoute's
experimental status is point-in-time (February 2026) and that Traefik
supports TCPRoute when experimental CRDs are installed separately.
@adamancini adamancini force-pushed the adamancini/gateway-api branch from 7ffd5b6 to afc198b Compare February 27, 2026 18:51
Copy link
Contributor

@scottrigby scottrigby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see a Gateway API pattern!

This PR looks great, except for one question (below)

CA_CERT="/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
SECRET_NAME="{{ include "storagebox.fullname" . }}-garage-s3"

# ---- Test 1: Admin API health check ----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that you added helm tests 💯

My only question is, since $GARAGE_ADMIN is a local k8s service (assuming from):

    env:
    - name: GARAGE_ADMIN
      value: "http://{{ .Release.Name }}-garage:3903"

why do we need to call it with a cert / bearer token? This isn't a pattern I see normally for service communication within the same chart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — the cert/bearer token usage was confusing because two separate auth contexts were mixed together:

  1. Garage admin token (Authorization: Bearer ${ADMIN_TOKEN}) — this is Garage's own application-level auth for its admin API on port 3903. Required for management endpoints like listing buckets, not related to K8s auth.

  2. K8s SA token + CA cert (SA_TOKEN, CA_CERT) — these were used to call the Kubernetes API server (not Garage) to verify the S3 credentials Secret exists and to read its contents for the round-trip test. That's where the cert/bearer pattern came from.

Simplified in the latest push: the test now mounts the S3 credentials Secret directly as a volume instead of fetching it via the K8s API at runtime. This removes the serviceAccountName, K8s API calls, SA token, and CA cert entirely. The only Bearer token left is Garage's admin token, which is clearly commented as application-level auth.

adamancini and others added 2 commits March 2, 2026 12:35
…ctly

Remove Kubernetes API calls from the helm test pod. Instead of fetching
the S3 credentials Secret via the K8s API with SA token + CA cert, mount
it directly as a volume. This eliminates the serviceAccountName, KUBE_API,
SA_TOKEN, and CA_CERT plumbing that was confusing two auth contexts
(Garage app-level auth vs K8s API auth).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants