Skip to content

fix(egress): allowlist the e2e/staging webhook callback host (A1)#113

Merged
brownjuly2003-code merged 1 commit into
mainfrom
fix/egress-guard-e2e-callback-allowlist
Jun 29, 2026
Merged

fix(egress): allowlist the e2e/staging webhook callback host (A1)#113
brownjuly2003-code merged 1 commit into
mainfrom
fix/egress-guard-e2e-callback-allowlist

Conversation

@brownjuly2003-code

Copy link
Copy Markdown
Owner

What

Fixes the SSRF egress guard vs E2E/staging webhook-callback collision — the single root cause of both the red E2E Tests and Staging Deploy workflows on main (rubric item R1 of road-to-9.8).

Root cause

The egress guard (audit_28_06_26.md #2) resolves every webhook/alert target and rejects any host on a private/loopback/link-local address. The E2E and Staging suites deliver the test webhook to:

  • E2E: host.docker.internal (the container's host gateway → private IP)
  • Staging: 127.0.0.1:18080 (the in-pod loopback relay)

Both are correctly rejected by the guard, so POST /v1/webhooks returned 400 and tests/e2e/test_smoke.py::test_webhook_test_endpoint_delivers_callback failed (assert 400 == 201). Neither workflow is a required check, so main went red unnoticed.

Fix

Opt-in allowlist AGENTFLOW_EGRESS_ALLOWED_HOSTS (comma-separated, exact hostnames, case-insensitive, default empty). When the URL host is listed, the public-address check is waived; the http(s) scheme check still applies. Production keeps the full guard because nothing sets the env.

Wired into exactly the deployments that need it:

  • docker-compose.e2e.yml → the configured callback host (host.docker.internal)
  • scripts/k8s_staging_up.sh127.0.0.1, next to the loopback-relay setup
  • tests/e2e/conftest.py → local uvicorn (127.0.0.1) + compose-override paths

Added workflow_dispatch to staging-deploy.yml (mirrors e2e.yml) so the deploy can be re-run manually.

Verification

  • 8 new regression tests in tests/unit/test_egress_guard.py: allowlist permits the listed host, is case-insensitive, still rejects bad schemes + non-listed private hosts, and an empty allowlist preserves loopback rejection.
  • Local: full unit suite 1218 passed, ruff check + format clean.
  • CI: E2E Tests dispatched on this branch; Staging Deploy verified on main after merge (its workflow_dispatch only becomes available once on the default branch).

🤖 Generated with Claude Code

The SSRF egress guard (audit_28_06_26.md #2) resolves every webhook/alert
target and rejects any host resolving to a private/loopback/link-local
address. The E2E and Staging deployments deliver the test webhook to the
host gateway (host.docker.internal) / the in-pod loopback relay (127.0.0.1),
both of which the guard correctly rejects — so POST /v1/webhooks returned 400
and tests/e2e/test_smoke.py::test_webhook_test_endpoint_delivers_callback
failed in *both* the E2E Tests and Staging Deploy workflows (not required
checks, so main went red unnoticed).

Add an opt-in allowlist, AGENTFLOW_EGRESS_ALLOWED_HOSTS (comma-separated,
exact hostnames, case-insensitive, default empty). When the URL host is on
the list the public-address check is waived; the http(s) scheme check still
applies and production keeps the full guard because nothing sets the env.

Wire the controlled callback host into each deployment that needs it:
- docker-compose.e2e.yml -> the configured callback host (host.docker.internal)
- scripts/k8s_staging_up.sh -> 127.0.0.1, alongside the loopback relay setup
- tests/e2e/conftest.py -> local uvicorn (127.0.0.1) and compose-override paths

Also add workflow_dispatch to staging-deploy.yml so the fix can be verified
on a branch before merging (mirrors e2e.yml).

Regression tests in tests/unit/test_egress_guard.py prove the allowlist
permits the listed host, stays case-insensitive, still rejects bad schemes
and non-listed private hosts, and that an empty allowlist preserves the
loopback rejection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

DORA Metrics

  • Window: last 30 days
  • Branch: main
  • Deployment frequency: 154 total / 35.93 per week
  • Lead time for changes: avg 0.28h / median 0.0h
  • Change failure rate: 60.39% (93/154)
  • MTTR: 0.25h across 3 incident(s)

@brownjuly2003-code brownjuly2003-code merged commit 8775e35 into main Jun 29, 2026
25 checks passed
@brownjuly2003-code brownjuly2003-code deleted the fix/egress-guard-e2e-callback-allowlist branch June 29, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants