Skip to content

Route by X-Saturn-Upstream header; delete the resolver#30

Open
hhuuggoo wants to merge 3 commits into
release-2026.02.01from
hugo/upstream-header-routing
Open

Route by X-Saturn-Upstream header; delete the resolver#30
hhuuggoo wants to merge 3 commits into
release-2026.02.01from
hugo/upstream-header-routing

Conversation

@hhuuggoo

Copy link
Copy Markdown
Contributor

What

phoebe no longer decides where to forward. The backend identity rides on the request as X-Saturn-Upstream, which Atlas injects on each inference deployment's per-subdomain IngressRoute. Atlas knows the real Service for that exact deployment when it builds the route — so the address is authoritative, not a guess.

Why (security)

phoebe sits behind auth-server, which authorizes a request for a specific deployment. If phoebe re-derives the upstream and gets it wrong, an authorized-for-X request is served by engine Y — a confused deputy that silently breaks the resource-scoping auth-server enforced. The old resolver's static (ignored the resource) and convention (guessed the Service DNS — could misroute authenticated traffic to a live wrong model) strategies were exactly this footgun. Atlas already solved routing at route-build time (one IngressRoute per subdomain → the deployment's own Service); the resolver reimplemented it, worse.

Forward to the address on the envelope; fail closed if it's missing.

Changes

  • proxy: read X-Saturn-Upstream, parse (host:port → http, or full URL), forward. Absent/unparseable → 502, refuse. There is no other source of a target, so misrouting authenticated traffic is structurally impossible. TestProxyNoUpstreamFailsClosed pins it: no header → 502, zero billing events.
  • Delete internal/registry wholesale (static/convention/cached/chain, the LookupFunc, LRU, k8s client) + its config (defaultUpstream, the registry: block). Drops the k8s.io/client-go dependency. −1541 lines.
  • identity: add HeaderUpstream with the anti-spoof contract note.

build + vet + all tests green; go mod tidy clean.

Contracts

  • New trusted header X-Saturn-Upstream (host:port or scheme://host:port), the routing authority. Must be in Traefik's ForwardAuth authResponseHeaders allowlist (strip-then-inject) so a client cannot spoof its own upstream — same discipline as the identity headers. (saturn-k8s allowlist + Atlas injection land in separate PRs.)
  • Removed config: defaultUpstream + the entire registry: block.
  • phoebe is useless until Atlas injects the header — these land together (depends on the saturn-k8s + Atlas PRs).

Supersedes the resolver work (#28) and its RBAC (saturn-k8s #985) — both should be reverted; this PR reverts the phoebe half.

hhuuggoo and others added 2 commits June 18, 2026 17:56
phoebe no longer decides where to forward. The backend identity now rides on the
request as the X-Saturn-Upstream header, which Atlas injects on each inference
deployment's per-subdomain IngressRoute — Atlas knows the real Service for that
exact deployment when it builds the route, so the address is authoritative, not
a guess.

WHY (security): phoebe sits behind auth-server, which authorizes a request for a
SPECIFIC deployment. If phoebe re-derives the upstream and gets it wrong, an
authorized-for-X request is served by engine Y — a confused deputy that silently
breaks the resource-scoping auth-server enforced. The old resolver had three
strategies for this: static (ignored the resource entirely), convention (GUESSED
the Service DNS from a name template — could misroute authenticated traffic to a
live wrong model), and a k8s-label lookup. All of them made phoebe a routing
authority it should never have been. Atlas already solved routing at route-build
time (one IngressRoute per subdomain → the deployment's own Service); the
resolver reimplemented that, worse.

So: forward to the address on the envelope, fail closed if it's missing.

- proxy: read X-Saturn-Upstream, parse (bare host:port → http, or full URL),
  forward there. Absent/unparseable/hostless → 502, REFUSE. There is no other
  source of a target, so misrouting authenticated traffic is structurally
  impossible, not merely discouraged. Test TestProxyNoUpstreamFailsClosed pins
  it: no header → 502, ZERO billing events.
- Delete internal/registry wholesale (static/convention/cached/chain, LookupFunc,
  the LRU, the k8s client) and its config surface (defaultUpstream, the registry
  block). Drops the k8s.io/client-go dependency.
- identity: add HeaderUpstream with the anti-spoof contract note.

Contracts:
- NEW trusted header X-Saturn-Upstream (value: host:port or scheme://host:port),
  the routing authority. MUST be in Traefik's ForwardAuth authResponseHeaders
  allowlist (strip-then-inject) so a client cannot spoof its own upstream — same
  discipline as the identity headers. (saturn-k8s + Atlas changes, separate PRs.)
- REMOVED config: defaultUpstream and the entire registry: block. A deployed
  config still setting them is ignored (unknown yaml keys), not rejected.
- phoebe is now USELESS until Atlas injects the header — these land together.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ttery fix)

Two CONFIRMED battery findings on the routing-authority parse boundary, both
ledger-endorsed SSRF-surface items:
- A non-http scheme (https/ftp/gopher://...) passed through and was forwarded
  with that transport — e.g. https:// makes phoebe attempt a TLS handshake to a
  plain-HTTP in-cluster engine (502), and gopher:// is forwarded verbatim.
- A path/query/fragment (host:8000/foo) passed through, and
  NewSingleHostReverseProxy PREPENDS the upstream path onto the request path —
  silently rewriting where every request lands on the engine (path smuggling).

parseUpstream now rejects a non-http scheme and any path/query/fragment, in
addition to the existing empty/unparseable/hostless checks — fail closed (→ 502,
zero billing events), the same contract as a missing header. Defense-in-depth at
the documented chokepoint: Atlas injects a clean bare host:port and the header is
anti-spoofed, but the parse boundary enforces its own contract loudly.

Adds TestParseUpstream (by-attack matrix): accepts host:port / http://host:port
(trimmed); rejects https/gopher/ftp schemes, path/query/fragment smuggling, and
empty/hostless input.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@hhuuggoo

Copy link
Copy Markdown
Contributor Author

Machine-review battery — round 1 (opus), Tier 2

Status: fixed. 2 CONFIRMED findings, both on the parseUpstream routing-authority boundary (ledger-endorsed SSRF surface):

  • No scheme validation (medium) — https:///gopher:///ftp:// passed through and were forwarded with that transport. Fixed: reject non-http scheme.
  • No path/query rejection (low) — host:8000/foo smuggled a path prefix (NewSingleHostReverseProxy prepends the upstream path). Fixed: reject path/query/fragment.

Both fail closed (→ 502, zero billing events). Added TestParseUpstream (by-attack matrix): accepts clean host:port/http URLs, rejects exotic schemes + path/query/fragment + hostless/empty. Build + vet + full suite green.

Note: gate agent rate-limited mid-run (reported STUCK for that reason, not a real failure); findings verified and fixed by hand.

The resolver deletion left an 'upstream *url.URL' param on the Server-building
test helpers (newIOLogServer, newIOLogServerWithLog, newTestServerWithSettings,
newTestServerE) — the Server no longer takes an upstream (routing is via the
X-Saturn-Upstream header), so the param is unused and revive failed lint.
Rename each to _ (the request helpers that actually USE upstream are unchanged).

Verified clean with golangci-lint v1.64.8 (CI's version): exit 0 repo-wide.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant