Refuse expired files with 410 instead of redirecting to the CDN#17
Merged
Conversation
Before this change, GET /files/{fn} for a file whose expires_at had
passed would still redirect to the CDN (or stream directly) until the
next GC run dropped the object — billing the project for the egress
and, worse, potentially priming the CDN cache with bytes the user is
no longer entitled to.
Extend the per-fn DB lookup to read expires_at alongside project_id
(both columns are immutable once written, so the existing 60s cache
TTL is still safe) and refuse expired files with 410 Gone in both
fileHandler and cdnFileHandler. The refusal runs *before* the
download/egress metric increments and before the 307, so expired
requests don't bill the project and the CDN never sees a redirect
that could refresh its cache.
cdnFileHandler gets the same check so an attacker who knows the
crypto-random fn can't bypass via the unauthenticated /_cdn/{fn}
origin endpoint.
Files in the bucket without a DB row (insert-failed uploads) and
files with a NULL expires_at are treated as non-expired — we have no
basis to refuse them.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous ordering — Bucket.Attributes then lookupFile — meant that a flood of requests against a single expired fn cached the DB row fine (60s TTL on an immutable row protects Postgres) but still hit GCS on every request, because the bucket call ran first. That's one billed Class B operation per request and direct exposure to GCS read quotas. Move the lookupFile/Expired check ahead of Bucket.Attributes in both fileHandler and cdnFileHandler. After the first request hydrates the cache, subsequent requests for the same expired fn return 410 entirely from memory and never touch GCS. For non-expired files this just reorders two calls we'd make anyway — no change in latency. For files with no DB row (insert-failed uploads) lookupFile returns Found=false and we fall through to the bucket exactly as before. Added two tests that prove the bucket is bypassed by inserting an expired DB row against an *empty* bucket: a regression would surface as 404 (bucket-NotFound) instead of 410. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e cache Two attack vectors the previous code didn't cover: 1) Random-garbage fns. Each request was a unique cache key, so the per-fn DB cache never helped. Every request burned one Postgres query + one GCS Class B op. An attacker can generate millions of unique garbage fns per second per box. 2) Same-valid-but-dead fn (e.g. a URL the attacker held after expiry and GC). The negative DB cache amortized Postgres after request 1, but Bucket.Attributes still ran on every subsequent request. Two-layer fix in both fileHandler and cdnFileHandler: - isValidFilename(fn): generateFilename produces exactly 86 chars of URL-safe base64 (no padding). Anything else can't possibly exist in the system, so reject in CPU before any I/O. Kills vector (1) entirely — a flood of garbage fns becomes pure ServeHTTP overhead. - BucketMissing flag on fileMeta: after a confirmed Bucket.Attributes NotFound for a fn with no DB row, write the negative back into the per-fn cache. Subsequent requests within the 60s TTL return 404 from memory without re-hitting GCS. Fixes vector (2). Also extracted fileCacheKey(fn) so lookupFile and the new cache write share one source of truth. Test changes: - New TestIsValidFilename links the validator to generateFilename so the two can't drift, plus alphabet/length boundary cases. - New TestFileHandler_InvalidFilenameRejected / CDN variant: garbage fns must 404 from the validator without touching DB or GCS. - New TestFileHandler_BucketMissNegativeCached / CDN variant: request a missing fn, then write the object out-of-band into the bucket; a second request must still 404, proving Bucket.Attributes was bypassed by the cached negative. - validTestFn(suffix) helper pads descriptive test names out to 86 chars so existing happy-path tests still pass the validator while remaining readable on failure. - TestFileHandler_CDNRedirect body-size threshold replaced with a direct "body must not contain the object bytes" check, since the default http.Redirect body now embeds an 86-char URL. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The fileMeta cache has a 60s TTL and is loaded lazily on miss. Under
load, the moment an entry expires (or at cold start), every concurrent
request for the same fn sees a cache miss before any of them finishes
writing — each then issues its own Postgres SELECT. With CDN cache TTL
and an in-cluster origin call pattern, that "herd at the edge" is a
very real shape against /files/{fn} and /_cdn/{fn}.
Wrap the DB query inside lookupFile with github.com/moonrhythm/sf.Do
(a generic, context-aware singleflight). Concurrent callers for the
same fn join one in-flight execution and share its result; only the
first caller's SELECT actually hits Postgres. The closure re-checks
the cache so a sibling caller that won the race a tick earlier also
benefits.
sf preserves context values via WithoutCancel, so pgctx still resolves
the DB connection inside the closure, and per-caller cancellation
returns ctx.Err() to that caller without killing the shared work the
remaining waiters depend on.
Test: TestLookupFile_SingleflightCollapsesConcurrentCalls fires 50
goroutines at a single cold-cache fn behind a release barrier and
asserts the per-fn DB-query counter stays below N. Without sf the
counter would equal N; with sf it's effectively 1. Per-fn counter
(sync.Map of *atomic.Uint64) so the test stays t.Parallel-safe.
Stable over 20 -race iterations.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three changes that share a single goal: an attacker who hits /files/
with random garbage shouldn't be able to bill the project or burn DB /
GCS quota.
1) sf around the checkAuth external call
Mirrors the lookupFile change in the previous commit. cacheTTL is
only 30s, so under upload load from one caller (parallel CI jobs
holding the same bearer) the cache-miss edge would otherwise
thunder the /me.authorized endpoint. sf.Do collapses concurrent
identical (auth, project, projectId) triples to one round-trip,
with the same re-check-cache-inside-the-closure pattern.
2) New filename scheme: short, alphanumeric, signed
generateFilename was 64 random bytes → 86-char URL-safe base64
(alphabet [A-Za-z0-9_-]). The base64 special chars never bought
anything and the length was generous to a fault. Replaced with:
fn = 24 chars from [0-9A-Za-z], rejection-sampled to be
unbiased (~143 bits of entropy, well over any guessing
or birthday-collision threshold).
sig = 20 chars of lowercase hex (HMAC-SHA256 truncated to
80 bits) keyed by App.SignKey.
The URL token is `fn + sig` = 44 chars total — down from 86, and
strictly alphanumeric. fn is what we store in the bucket and the
files.fn column; the full token only ever appears in URLs.
3) HMAC verification before any backend hit
fileHandler / cdnFileHandler now run parseToken first. It checks
length + constant-time HMAC; on any mismatch we 404 in CPU. This
is the *primary* DDoS shield — random /files/{garbage} floods by
anyone who doesn't have sign_key never reach lookupFile, never
reach Bucket.Attributes. The negative bucket cache and per-fn
metadata cache still matter for the residual case of a replayed
real URL after expiry+GC.
isValidFilename is gone; parseToken subsumes its job (length +
alphabet implicitly enforced by HMAC matching) with stronger
guarantees.
Config: new required env var sign_key (HMAC key). Rotating it
invalidates every outstanding URL — the correct behavior if a key
ever leaks. Tests use a fixed testSignKey threaded through
newTestApp; updated existing tests to put signed tokens (not raw
fns) in URLs and SetPathValue("token"). Added direct tests for
generateFilename alphabet+uniformity, signFilename
determinism+key-sensitivity, parseToken round-trip and forgery
rejection, plus an sf-collapses-thundering-herd test for checkAuth
that mirrors the lookupFile one.
CDN redirect now preserves the full signed token so the CDN's
origin-fetch hits /_cdn/{token} with a URL the origin can
re-verify. Updated TestFileHandler_CDNRedirect accordingly.
README and CLAUDE.md updated to describe the new token format,
sign_key requirement, and the four-tier DDoS protection ladder
(parseToken → cache → sf.Do → GCS).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… URLs
The previous token layout — fn (fnLen=24 chars) concatenated directly
with sig (sigLen=20 chars) — relied on a fixed cut point at index
fnLen for parseToken. That means any future bump or trim of fnLen
would 404 every outstanding URL: parseToken would slice the old token
at the wrong position and compute HMAC over the wrong fn.
Insert "-" between them. fn is [0-9A-Za-z] and sig is hex, so neither
side can ever contain "-" — the separator is unambiguous. parseToken
now splits on strings.IndexByte('-') instead of a positional slice, so
the verifier no longer cares what fnLen was when a given token was
issued: as long as sig is what signFilename(key, fn) produces today,
the token validates.
tokenLen is gone as a single constant (token length is now
fn-length-dependent). Tests that asserted positional layout were
rewritten to assert structural properties:
- TestParseToken_RoundTripsGeneratedToken now checks for the
separator's presence and uses fnLen+1+sigLen as the *current*
expected length, not a hard invariant.
- New TestParseToken_AcceptsDifferentFnLengths feeds fns of 1, 3, 24,
and 64 chars through makeToken/parseToken to lock in the property
that motivated this change.
- TestParseToken_RejectsForgeries and TestFileHandler_InvalidTokenRejected
swapped their length-based bad cases for structural ones: missing
separator, empty fn, empty sig — a 44-char string with no "-" now
fails for the right reason.
CLAUDE.md and README updated to mention the "-" and the
backward-compatibility property it buys.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Before this change /_cdn relied on streamFile to copy attrs.CacheControl
("public, max-age=86400") onto the response. That has two problems for
the edge:
1) The bucket's 24h max-age caps CDN cache at one day even for files
whose TTL is 7 days — every cached object re-fetches origin six
extra times before expiry.
2) Error responses (410, 404) carried no Cache-Control at all, so a
shared-then-expired URL kept hitting origin on every probe.
cdnFileHandler now sets Cache-Control explicitly per outcome:
- success (200): public, max-age={remaining TTL}, immutable. fn is
unique per upload so the body genuinely never changes; capping
max-age at expires_at means the edge stops serving past
end-of-life instead of holding the bytes for its own cache TTL.
Falls back to attrs.CacheControl if expires_at is unknown (the
rare insert-failed-upload case).
- expired (410) and bucket-missing (404): public, max-age=3600. The
answer is permanent; let the edge absorb repeat probes.
- invalid token (404): no Cache-Control. Each garbage token is a
unique URL, so caching wouldn't reduce origin load and would just
fill edge slots with attacker traffic.
streamFile now only falls back to attrs.CacheControl when the caller
hasn't already chosen one — that lets cdnFileHandler pre-set the
TTL-aligned policy without streamFile clobbering it. fileHandler is
unchanged: it never sets Cache-Control beforehand, so streamFile still
uses the bucket's value (correct for the direct-stream / browser
case).
Tests:
- TestCDNFileHandler_SuccessCacheControlUsesRemainingTTL inserts a
7-day-TTL row alongside the bucket's 24h CacheControl and asserts
the response's max-age is ~7 days, not 86400 — locks in the
override.
- TestCDNFileHandler_ExpiredCacheControl asserts 410 carries
public, max-age=3600.
- TestCDNFileHandler_NotFound extended to assert the same on the
bucket-NotFound 404 path.
- TestCDNFileHandler_InvalidTokenNoCacheControl asserts garbage-token
404s leave Cache-Control empty so attacker traffic can't pollute
edge cache.
- TestCDNFileHandler_Streams (existing) still passes — it has no DB
row, so meta.ExpiresAt is zero and we correctly fall through to
attrs.CacheControl.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 task
acoshift
added a commit
that referenced
this pull request
May 24, 2026
## Summary
- Store the full signed download token (`fn-{hmac}`) in a new
`files.token` column at upload time, so the api's `dropbox.List` can
rebuild working download URLs without holding `sign_key`.
## Why
PR #17 made the HMAC signature mandatory: `GET /files/{token}` runs
`parseToken` first and 404s anything without a valid tag. But
`dropbox.List` (in apiserver) builds its `downloadUrl` from the bare
`fn` — it has no `sign_key`, so every link it returns is a dead
`/files/{fn}` that fails the HMAC check. Persisting the token dropbox
already computes is the simplest fix; the alternative (sharing
`sign_key` with the api) couples key rotation across services.
## How
- `schema/03_token.sql` + `schema.sql`: add `token text not null default
''`. The default keeps the migration safe for existing rows — they carry
an empty token until they self-expire (≤7-day max TTL), which is no
worse than today (links are already broken).
- `uploadHandler` computes the token once, writes it to the new column,
and reuses it for the response `downloadUrl` (previously recomputed
inline).
## Schema migration
```sql
alter table files add column token text not null default '';
```
(dropbox's `schema/*.sql` migrator is not run at startup — apply
out-of-band, same as other columns.)
## Deploy ordering
1. Apply the migration above.
2. Deploy **dropbox** (this PR) — new uploads start populating `token`.
3. Deploy **apiserver** ([deploys-app/apiserver companion
PR](https://github.com/deploys-app/apiserver)) — reads `token` for
`dropbox.List`.
## Test plan
- [x] `go test ./...` passes
- `TestUpload_Success` now asserts the token in `downloadUrl` is
persisted verbatim to `files.token`.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
GET /files/{fn}andGET /_cdn/{fn}now return410 Gonefor files whoseexpires_athas passed, instead of streaming or redirecting to the CDN.Why
Before this change, an expired file kept serving until the next GC run dropped the object from the bucket. With the CDN redirect path added in #16, that meant:
egress_bytesmetric is bumped pre-redirect at the full object size).Closing the read path independently of GC fixes both.
How
lookupFileProject→lookupFile, which readsproject_idandexpires_atin the same query and returns afileMeta{ProjectID, ExpiresAt, Found}. Both columns are immutable once written, so the existing 60s in-process cache TTL is still safe — no risk of caching a stale view of expiration.fileMeta.Expired()and refused expired files with410 Gonein both handlers. The check runs beforedownloadCount/egressBytesare incremented and before the 307, so expired requests don't bill the project and the CDN never sees a redirect that could refresh its cache.cdnFileHandlerso the unauthenticated/_cdn/{fn}origin can't be used to bypass.Edge cases preserved
expires_at: still serves. Same reasoning.Attributesreturns NotFound first →404(unchanged).Tests
TestLookupFile_FromDB/TestLookupFile_NotFound/TestLookupFile_Expired— replace the oldTestLookupFileProject_*and cover the newExpired()predicate.TestFileHandler_ExpiredReturnsGone— DB row in the past, object still in bucket; expects410and asserts the body doesn't leak the object content.TestFileHandler_ExpiredOverridesCDNRedirect— same withCDNBaseURLconfigured; asserts410and emptyLocationheader (the expired check must beat the CDN redirect).TestCDNFileHandler_ExpiredReturnsGone— exercises the same refusal on/_cdn/{fn}via the router.Why
410over404: the file did exist at this URL and is being intentionally refused. It also reads correctly if the CDN ever logs origin status codes.🤖 Generated with Claude Code