From 076a69889e79bbd6832b385f3ce1df4509d2ce17 Mon Sep 17 00:00:00 2001 From: gaurav0107 Date: Sun, 7 Jun 2026 16:57:26 +0530 Subject: [PATCH] fix(gke): point web backend health check at /login (no healthy upstream) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The GKE Gateway-provisioned GCP LB was health-checking the web backend at the default path '/'. Next.js returns 307 → /login at '/' (signed-in state lives in a session cookie); GCP LBs treat anything other than 2xx as UNHEALTHY, so the backend NEG was perma-UNHEALTHY and the public domain served "no healthy upstream" 503s on every deploy. Earlier fix (31d8f81 "point web probes at /login") fixed the kubelet- side livenessProbe / readinessProbe but not the LB-side health check — those are configured separately via HealthCheckPolicy on the Service. Verified live: - Backend service health flips UNHEALTHY → HEALTHY ~45s after apply. - `curl -sI https://langprobe.daz.co.in/login` returns 200. - `curl -sI https://langprobe.daz.co.in/` returns 307 (correct Next.js redirect, not the LB error). Signed-off-by: Gaurav Dubey Signed-off-by: gaurav0107 --- deploy/k8s/gke-gateway/README.md | 8 +++++ deploy/k8s/gke-gateway/healthcheckpolicy.yaml | 34 +++++++++++++++++++ 2 files changed, 42 insertions(+) create mode 100644 deploy/k8s/gke-gateway/healthcheckpolicy.yaml diff --git a/deploy/k8s/gke-gateway/README.md b/deploy/k8s/gke-gateway/README.md index a5a16c7..11160e4 100644 --- a/deploy/k8s/gke-gateway/README.md +++ b/deploy/k8s/gke-gateway/README.md @@ -41,6 +41,14 @@ supports Gateway API). kubectl apply -n tracebility -f deploy/k8s/gke-gateway/ ``` +The directory contains: + +- `gateway.yaml` — the GKE-managed L7 LB (HTTP+HTTPS listeners on the static IP). +- `httproute.yaml` — routes `langprobe.daz.co.in/*` to the `tracebility-web` Service. +- `healthcheckpolicy.yaml` — points the GCP backend health check at `/login` + instead of the default `/`. Without it, `/` returns 307 → LB marks the + backend UNHEALTHY → public domain serves "no healthy upstream" 503s. + ## Wait for provisioning ```bash diff --git a/deploy/k8s/gke-gateway/healthcheckpolicy.yaml b/deploy/k8s/gke-gateway/healthcheckpolicy.yaml new file mode 100644 index 0000000..6dac90b --- /dev/null +++ b/deploy/k8s/gke-gateway/healthcheckpolicy.yaml @@ -0,0 +1,34 @@ +--- +# Tells the GCP L7 LB (provisioned by the Gateway resource in this dir) to +# health-check the tracebility-web Service at /login instead of the default +# /. The web app responds to / with a 307 redirect to /login (signed-in +# state lives in a session cookie); GCP LBs treat anything other than 2xx +# as UNHEALTHY, so without this policy the backend NEG goes UNHEALTHY on +# every deploy and the LB returns "no healthy upstream" 503s. +# +# kubelet probes (livenessProbe / readinessProbe on the Deployment) are +# already configured to hit /login — see the chart at +# deploy/helm/tracebility/templates/web-deployment.yaml. This policy is the +# LB-side counterpart so the GCP NEG health matches kubelet's view. +apiVersion: networking.gke.io/v1 +kind: HealthCheckPolicy +metadata: + name: tracebility-web + namespace: tracebility +spec: + default: + config: + type: HTTP + httpHealthCheck: + port: 7090 + requestPath: /login + checkIntervalSec: 15 + timeoutSec: 15 + healthyThreshold: 1 + unhealthyThreshold: 2 + logConfig: + enabled: true + targetRef: + group: "" + kind: Service + name: tracebility-web