Gatewayapi Namespaced Mode#4690
Conversation
electricjesus
left a comment
There was a problem hiding this comment.
Good work on the runtime Helm rendering migration — dropping 53K lines of pre-rendered YAML is a huge win. The GatewayNamespace mode looks solid with good test coverage. A few observations below.
| CurrentGatewayClasses: set.New[string](), | ||
| } | ||
|
|
||
| if gatewayAPI.Spec.GatewayDeploymentMode == nil { |
There was a problem hiding this comment.
Nit: The CRD already has +kubebuilder:default=ControllerNamespace, so any persisted GatewayAPI resource will have this field populated by the API server. This runtime defaulting only matters for in-memory objects that were never persisted (tests?). Not a problem, just noting the redundancy — if the CRD default is the source of truth, a comment here explaining why you also default in code would help future readers.
There was a problem hiding this comment.
It is used by the tests
All good catches man, all sorted |
| // Gateway resources using operator-managed GatewayClasses. These namespaces need | ||
| // per-namespace Enterprise resources (SA, CRB, pull secrets). | ||
| if *gatewayAPI.Spec.GatewayDeploymentMode == operatorv1.GatewayDeploymentModeGatewayNamespace && | ||
| variant.IsEnterprise() { |
There was a problem hiding this comment.
Why does the Variant matter here at all?
There was a problem hiding this comment.
For resources that are rendered only on EE license, like WAF.
04d49c6 to
8c6f64c
Compare
- Swap the checked-in gateway_api_resources.yaml for the embedded gateway-helm.tgz rendered via the helm SDK at startup; K8SGatewayAPICRDs/GatewayAPICRDs now take a runtime.Scheme and return an error (istio_controller updated for the new signature) - Deploy two envoy-gateway controllers: legacy in tigera-gateway (user-declared classes via Spec.GatewayClasses) and a new one in calico-system with deploy.type=GatewayNamespace; auto-provision the tigera-gateway-class-ns GatewayClass bound to the new controller - Group the tigera-gateway install behind legacyObjects/legacyTeardownObjects so the eventual deprecation is a single delete - HasLegacyGateways classifier in the controller: build a className -> controllerName map seeded from Spec.GatewayClasses + existing GatewayClass resources, classify every live Gateway; when no Gateway targets the tigera-gateway controller, the install is torn down; during the teardown-then-redeploy race the legacy render is deferred to avoid a "Namespace is terminating, skipping creation" log flood - Legacy teardown queues only the Namespace + cluster-scoped objects + the Deployment (for status.RemoveDeployments); in-namespace RBAC/Secrets ride the cascade to avoid the tigera-operator-secrets RoleBinding race - Move the shared waf-http-filter ClusterRoles out of the legacy bundle so the calico-system-side proxies keep their cluster-scoped perms after tigera-gateway is retired - Per-namespace Enterprise resources (SA, RoleBindings, pull secret, shared CRB subject) for namespaces hosting a namespaced-class Gateway; reserved namespaces skip shared resource create/delete; Secret goes before RoleBinding on cleanup to avoid 403 - Gate v3 NetworkPolicies on the calico-system Tier; render calico-system.envoy-gateway allow for the controller and certgen - Update unit tests and Makefile/docs accordingly Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Cover the calico-system envoy-gateway controller lifecycle, per-namespace resource provisioning and cleanup, custom EnvoyProxy and EnvoyGateway ConfigMap watches, owning-gateway env vars in l7-log-collector, and the legacy-class teardown path - Teardown sequencing for tigera-gateway cascading Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lico-system - Render one envoy-gateway controller in calico-system with deploy.type=GatewayNamespace - Auto-provision tigera-gateway-class; honour user overrides if redeclared in Spec.GatewayClasses - Enumerate every operator-owned object from the legacy tigera-gateway install for cleanup (pull Secrets before tigera-operator-secrets); keep the Namespace itself in case users placed their own resources there - Point GatewayAPI finalizer at the calico-system envoy-gateway Deployment - Drop dual-controller fixtures and the legacy-undeploy test; consolidate FV tests to the calico-system layout
0d63b8f to
d3ef961
Compare
Description
Replace the previous Gateway-API install — which ran an Envoy Gateway controller in
tigera-gatewayand deployed all proxy workloads in that same namespace — with a single envoy-gateway controller incalico-systemrunning withdeploy.type=GatewayNamespace, so proxy workloads land in each Gateway's own namespace.This is a breaking change for clusters running the legacy install. Existing
GatewayCRs do not need edits —tigera-gateway-classand its controllerName are preserved — but proxy Pods, theirServices and LoadBalancer addresses are recreated in each Gateway's own namespace on first reconcile after upgrade. Anything pinned totigera-gateway(NetworkPolicies, monitoring, RBAC, external DNS) must follow.calico-systemwithcontrollerName=gateway.envoyproxy.io/gatewayclass-controller(chart default) anddeploy.type=GatewayNamespace. ControllerName + GatewayClass name are deliberately reused from the legacy install so existingGatewayCRs continue to be claimed.tigera-gateway-classGatewayClass + EnvoyProxy. Users can declare additional classes viaGatewayAPI.Spec.GatewayClasses; all classes target the single controller.gateway-helm.tgzand render at runtime via the Helm SDK; result is cached per process viasync.Once. Replaces the previous pre-rendered YAML.waf-http-filterSA + per-namespacewaf-http-filter-gateway-resourcesRoleBinding (least-privilege Gateway-API reads),tigera-operator-secretsRoleBinding,tigera-pull-secretcopy. Cluster-scoped perms (licensekeys, tokenreviews) go through a single sharedwaf-http-filter-gateway-namespacesClusterRoleBinding whose Subjects list is recomputed each reconcile.calico-system,tigera-operator): the operator does not create or delete the sharedtigera-operator-secretsRoleBinding ortigera-pull-secretcopy in those namespaces — the core Installation controller owns them.tigera-operator-secretsRoleBinding. That RoleBinding is what grants the operator Secret-delete perms, so reversing the order yields a 403 and aborts the reconcile.tigera-gatewayinstall: controller Deployment/Service/SAs/ConfigMap, certgen Job + RBAC, namespaced Role/RoleBinding, copied pull Secrets,tigera-operator-secretsRoleBinding,envoy-gateway-topology-injector.tigera-gatewayMWC, the orphanedwaf-http-filter-cluster-scopedandwaf-http-filter-gateway-resourcesClusterRoleBindings, and the deprecated combinedwaf-http-filterClusterRole/ClusterRoleBinding. Pull Secrets are queued beforetigera-operator-secrets. Thetigera-gatewayNamespace itself is intentionally not queued — users may have placed their own resources in it.calico-system.envoy-gateway) under thecalico-systemtier to keep the controller + certgen Job working under default-deny. Selector covers bothapp.kubernetes.io/name=gateway-helmandapp=certgen(the chart applies different labels to the certgen Job vs its pod template). Egress: DNS + kube-apiserver, then Pass. Ingress: 9443 (topology-injector webhook), 18000-18002 (xDS), 19001 (metrics).tigera-gatewayNamespace asserted not in the delete list). FV coverage: deploys the controller incalico-systemand asserts nothing lands intigera-gateway, provisions and cleans up per-namespace resources, GatewayClass + EnvoyProxy cleanup, custom EnvoyProxy watch, l7-log-collector owning-gateway env wiring, custom EnvoyGateway ConfigMap.Security
The Enterprise per-namespace render copies
tigera-pull-secretinto every namespace that hosts a Gateway, so permissive RBAC on those namespaces can expose the pull secret. Reserved namespaces are excluded from create and delete of the shared resources, so the operator does not clobber core-Installation-owned secrets.Upgrade / compatibility
In-place upgrade. The single controller in
calico-systemclaims alltigera-gateway-classGateways unchanged. Proxy Pods, their Services, and LoadBalancer addresses are recreated in each Gateway's own namespace on first reconcile, so any cluster setting pinned totigera-gateway(NetworkPolicies, monitoring, RBAC, external DNS) must be repointed to the Gateway's own namespace. The legacy controller install is removed automatically; thetigera-gatewayNamespace itself is preserved in case it holds user resources.Calico-private operator RBAC update required:
envoy-gateway-topology-injector.calico-systemadded to themutatingwebhookconfigurationsresourceNameslist withupdateanddeleteverbs (the legacy.tigera-gatewayentry is retained withdeleteso the upgrade-cleanup can reap it).Release Note
For PR author
make gen-filesmake gen-versionsFor PR reviewers
A note for code reviewers - all pull requests must have the following:
kind/bugif this is a bugfix.kind/enhancementif this is a a new feature.enterpriseif this PR applies to Calico Enterprise only.