Skip to content

Fix staging deploy: startup probe and disable network policy#230

Merged
RafaelPo merged 2 commits into
mainfrom
fix/mcp-startup-probe
Feb 25, 2026
Merged

Fix staging deploy: startup probe and disable network policy#230
RafaelPo merged 2 commits into
mainfrom
fix/mcp-startup-probe

Conversation

@RafaelPo

Copy link
Copy Markdown
Contributor

Summary

  • Add startup probe (60s budget) to handle slow Redis Sentinel cold-start connections
  • Increase liveness/readiness probe timeouts from 5s to 10s
  • Disable network policy — GKE assigns ClusterIPs from a non-RFC1918 range (34.118.x.x), breaking both DNS and Redis egress rules

Root cause

The network policy's DNS egress rule uses podSelector for kube-dns, but GKE evaluates egress against the ClusterIP (34.118.224.10), not the pod IP. This blocked all DNS resolution, preventing Redis Sentinel discovery, causing /health to hang and liveness probes to kill the pod.

Changes

  • deployment.yaml: Add startupProbe (12 attempts x 5s = 60s), increase liveness initialDelaySeconds to 30s, failureThreshold to 5, timeouts to 10s
  • values.yaml: networkPolicy.enabled: false (was true)

Deployed

Already deployed to staging from this branch — pod is 1/1 Running with health checks passing.

Test plan

  • Staging pod starts and passes health checks
  • /health returns 200
  • Network policy to be re-enabled in a follow-up with proper GKE service CIDR handling

🤖 Generated with Claude Code

RafaelPo and others added 2 commits February 25, 2026 15:18
Redis Sentinel discovery can take >5s on cold start, causing the
liveness probe (5s timeout) to kill the pod before it establishes a
connection. Add a startup probe (60s budget) to protect the boot
phase, and increase liveness timeout to 10s with higher failure
threshold.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GKE assigns ClusterIPs from a non-RFC1918 range (34.118.x.x). The
network policy's DNS egress rule uses podSelector for kube-dns pods,
but GKE evaluates egress against the ClusterIP (34.118.224.10), not
the pod IP. This blocks all DNS resolution from the MCP pods.

Disable until we can properly handle GKE's service CIDR in egress
rules (may require CIDR-based DNS rules instead of podSelector).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@RafaelPo RafaelPo merged commit a1e10eb into main Feb 25, 2026
5 checks passed
@RafaelPo RafaelPo deleted the fix/mcp-startup-probe branch February 25, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant