Route by X-Saturn-Upstream header; delete the resolver#30
Open
hhuuggoo wants to merge 3 commits into
Open
Conversation
phoebe no longer decides where to forward. The backend identity now rides on the request as the X-Saturn-Upstream header, which Atlas injects on each inference deployment's per-subdomain IngressRoute — Atlas knows the real Service for that exact deployment when it builds the route, so the address is authoritative, not a guess. WHY (security): phoebe sits behind auth-server, which authorizes a request for a SPECIFIC deployment. If phoebe re-derives the upstream and gets it wrong, an authorized-for-X request is served by engine Y — a confused deputy that silently breaks the resource-scoping auth-server enforced. The old resolver had three strategies for this: static (ignored the resource entirely), convention (GUESSED the Service DNS from a name template — could misroute authenticated traffic to a live wrong model), and a k8s-label lookup. All of them made phoebe a routing authority it should never have been. Atlas already solved routing at route-build time (one IngressRoute per subdomain → the deployment's own Service); the resolver reimplemented that, worse. So: forward to the address on the envelope, fail closed if it's missing. - proxy: read X-Saturn-Upstream, parse (bare host:port → http, or full URL), forward there. Absent/unparseable/hostless → 502, REFUSE. There is no other source of a target, so misrouting authenticated traffic is structurally impossible, not merely discouraged. Test TestProxyNoUpstreamFailsClosed pins it: no header → 502, ZERO billing events. - Delete internal/registry wholesale (static/convention/cached/chain, LookupFunc, the LRU, the k8s client) and its config surface (defaultUpstream, the registry block). Drops the k8s.io/client-go dependency. - identity: add HeaderUpstream with the anti-spoof contract note. Contracts: - NEW trusted header X-Saturn-Upstream (value: host:port or scheme://host:port), the routing authority. MUST be in Traefik's ForwardAuth authResponseHeaders allowlist (strip-then-inject) so a client cannot spoof its own upstream — same discipline as the identity headers. (saturn-k8s + Atlas changes, separate PRs.) - REMOVED config: defaultUpstream and the entire registry: block. A deployed config still setting them is ignored (unknown yaml keys), not rejected. - phoebe is now USELESS until Atlas injects the header — these land together. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ttery fix) Two CONFIRMED battery findings on the routing-authority parse boundary, both ledger-endorsed SSRF-surface items: - A non-http scheme (https/ftp/gopher://...) passed through and was forwarded with that transport — e.g. https:// makes phoebe attempt a TLS handshake to a plain-HTTP in-cluster engine (502), and gopher:// is forwarded verbatim. - A path/query/fragment (host:8000/foo) passed through, and NewSingleHostReverseProxy PREPENDS the upstream path onto the request path — silently rewriting where every request lands on the engine (path smuggling). parseUpstream now rejects a non-http scheme and any path/query/fragment, in addition to the existing empty/unparseable/hostless checks — fail closed (→ 502, zero billing events), the same contract as a missing header. Defense-in-depth at the documented chokepoint: Atlas injects a clean bare host:port and the header is anti-spoofed, but the parse boundary enforces its own contract loudly. Adds TestParseUpstream (by-attack matrix): accepts host:port / http://host:port (trimmed); rejects https/gopher/ftp schemes, path/query/fragment smuggling, and empty/hostless input. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
Author
Machine-review battery — round 1 (opus), Tier 2Status: fixed. 2 CONFIRMED findings, both on the
Both fail closed (→ 502, zero billing events). Added Note: gate agent rate-limited mid-run (reported STUCK for that reason, not a real failure); findings verified and fixed by hand. |
The resolver deletion left an 'upstream *url.URL' param on the Server-building test helpers (newIOLogServer, newIOLogServerWithLog, newTestServerWithSettings, newTestServerE) — the Server no longer takes an upstream (routing is via the X-Saturn-Upstream header), so the param is unused and revive failed lint. Rename each to _ (the request helpers that actually USE upstream are unchanged). Verified clean with golangci-lint v1.64.8 (CI's version): exit 0 repo-wide. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
phoebe no longer decides where to forward. The backend identity rides on the request as
X-Saturn-Upstream, which Atlas injects on each inference deployment's per-subdomain IngressRoute. Atlas knows the real Service for that exact deployment when it builds the route — so the address is authoritative, not a guess.Why (security)
phoebe sits behind auth-server, which authorizes a request for a specific deployment. If phoebe re-derives the upstream and gets it wrong, an authorized-for-X request is served by engine Y — a confused deputy that silently breaks the resource-scoping auth-server enforced. The old resolver's
static(ignored the resource) andconvention(guessed the Service DNS — could misroute authenticated traffic to a live wrong model) strategies were exactly this footgun. Atlas already solved routing at route-build time (one IngressRoute per subdomain → the deployment's own Service); the resolver reimplemented it, worse.Forward to the address on the envelope; fail closed if it's missing.
Changes
X-Saturn-Upstream, parse (host:port→ http, or full URL), forward. Absent/unparseable → 502, refuse. There is no other source of a target, so misrouting authenticated traffic is structurally impossible.TestProxyNoUpstreamFailsClosedpins it: no header → 502, zero billing events.internal/registrywholesale (static/convention/cached/chain, the LookupFunc, LRU, k8s client) + its config (defaultUpstream, theregistry:block). Drops thek8s.io/client-godependency. −1541 lines.HeaderUpstreamwith the anti-spoof contract note.build + vet + all tests green;
go mod tidyclean.Contracts
X-Saturn-Upstream(host:portorscheme://host:port), the routing authority. Must be in Traefik's ForwardAuthauthResponseHeadersallowlist (strip-then-inject) so a client cannot spoof its own upstream — same discipline as the identity headers. (saturn-k8s allowlist + Atlas injection land in separate PRs.)defaultUpstream+ the entireregistry:block.Supersedes the resolver work (#28) and its RBAC (saturn-k8s #985) — both should be reverted; this PR reverts the phoebe half.