Skip to content

gatewayapi: render namespace before EnvoyProxyRef resolution#4756

Open
electricjesus wants to merge 1 commit intotigera:masterfrom
electricjesus:seth/gatewayapi-ns-first
Open

gatewayapi: render namespace before EnvoyProxyRef resolution#4756
electricjesus wants to merge 1 commit intotigera:masterfrom
electricjesus:seth/gatewayapi-ns-first

Conversation

@electricjesus
Copy link
Copy Markdown
Member

@electricjesus electricjesus commented Apr 28, 2026

Description

Bug fix in the gatewayapi controller. Components affected: pkg/controller/gatewayapi, pkg/render/gatewayapi.

When a GatewayAPI CR is created on a fresh install with envoyProxyRef set on a GatewayClass pointing at an EnvoyProxy in the operator-managed tigera-gateway namespace, reconcile early-returns at pkg/controller/gatewayapi/gatewayapi_controller.go:350 on the missing EnvoyProxy before reaching the non-CRD render at gatewayapi_controller.go:419 that creates the namespace. Users can't create the EnvoyProxy because the namespace doesn't exist, so the controller deadlocks in Degraded state with:

gatewayapi  Degraded=True  "Error reading EnvoyProxyRef: EnvoyProxy.gateway.envoyproxy.io 'gateway-conformance-base' not found"

This PR splits the namespace render out of the non-CRD component into a new GatewayAPINamespaceComponent and applies it early, alongside CRDs and before EnvoyProxyRef resolution. The tigera-gateway namespace is part of the operator's contract for any GatewayAPI CR, so it should always exist; otherwise users can't pre-create custom EnvoyProxy resources in it.

The new component is owned by the GatewayAPI CR (same as before), so the namespace is still cleaned up when the CR is removed. CreateOrUpdateOrDelete is idempotent on the existing namespace, so existing clusters are unaffected.

Testing

  • make ut UT_DIR=./pkg/controller/gatewayapi — 14/14 pass; added 1 regression test covering the bug case (EnvoyProxyRef configured but EnvoyProxy missing → namespace still rendered)
  • make ut UT_DIR=./pkg/render/gatewayapi
  • Manual verification on a single-node Calico OSS kind cluster (master): apply GatewayAPI with envoyProxyRef pointing at a not-yet-existent EnvoyProxy in tigera-gatewaytigera-gateway namespace created within seconds, tigerastatus gatewayapi Degraded with the expected Error reading EnvoyProxyRef; then apply the EnvoyProxy → reconcile completes, gatewayapi Available=True, envoy-gateway pod Running, GatewayClass tigera-gateway-class Accepted

Release Note

Fix gatewayapi controller deadlock on fresh install when a GatewayClass envoyProxyRef points at an EnvoyProxy in tigera-gateway: the tigera-gateway namespace is now rendered before EnvoyProxyRef resolution, so users can pre-create custom EnvoyProxy resources without the controller stalling.

For PR author

  • Tests for change.
  • If changing pkg/apis/, run make gen-files
  • If changing versions, run make gen-versions

For PR reviewers

  • Milestone set according to targeted release.
  • Appropriate labels:
    • kind/bug if this is a bugfix.
    • kind/enhancement if this is a a new feature.
    • enterprise if this PR applies to Calico Enterprise only.

When a GatewayAPI CR is created on a fresh install with envoyProxyRef
set on a GatewayClass pointing at an EnvoyProxy in the operator-managed
tigera-gateway namespace, reconcile would early-return on the missing
EnvoyProxy before reaching the non-CRD render that creates the
namespace. Users could not create the EnvoyProxy because the namespace
did not exist, deadlocking the controller in Degraded state.

Render the namespace alongside CRDs, before EnvoyProxyRef resolution,
so the namespace is always present as part of the operator's contract
for any GatewayAPI CR. The existing non-CRD pass continues to apply
the namespace later (idempotent), keeping the GatewayAPI CR as owner
so the namespace is still cleaned up on CR removal.
@electricjesus electricjesus requested a review from a team as a code owner April 28, 2026 09:08
@marvin-tigera marvin-tigera added this to the v1.43.0 milestone Apr 28, 2026
@electricjesus electricjesus added the kind/bug Something isn't working label Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants