Skip to content

Refuse expired files with 410 instead of redirecting to the CDN#17

Merged
acoshift merged 7 commits into
mainfrom
claude/strange-mclean-0b1f86
May 24, 2026
Merged

Refuse expired files with 410 instead of redirecting to the CDN#17
acoshift merged 7 commits into
mainfrom
claude/strange-mclean-0b1f86

Conversation

@acoshift

Copy link
Copy Markdown
Member

What

GET /files/{fn} and GET /_cdn/{fn} now return 410 Gone for files whose expires_at has passed, instead of streaming or redirecting to the CDN.

Why

Before this change, an expired file kept serving until the next GC run dropped the object from the bucket. With the CDN redirect path added in #16, that meant:

  • The project got billed for the egress (the egress_bytes metric is bumped pre-redirect at the full object size).
  • The 307 to the CDN edge could prime the cache with bytes the user is no longer entitled to — and CDN cache entries can outlive the GC.

Closing the read path independently of GC fixes both.

How

  • Extended lookupFileProjectlookupFile, which reads project_id and expires_at in the same query and returns a fileMeta{ProjectID, ExpiresAt, Found}. Both columns are immutable once written, so the existing 60s in-process cache TTL is still safe — no risk of caching a stale view of expiration.
  • Added fileMeta.Expired() and refused expired files with 410 Gone in both handlers. The check runs before downloadCount/egressBytes are incremented and before the 307, so expired requests don't bill the project and the CDN never sees a redirect that could refresh its cache.
  • Applied the same check to cdnFileHandler so the unauthenticated /_cdn/{fn} origin can't be used to bypass.

Edge cases preserved

  • File in bucket but no DB row (insert-failed upload): still serves. We have no recorded expiration to enforce against; matches prior behavior.
  • DB row with NULL expires_at: still serves. Same reasoning.
  • Expired and already GC'd: bucket Attributes returns NotFound first → 404 (unchanged).

Tests

  • TestLookupFile_FromDB / TestLookupFile_NotFound / TestLookupFile_Expired — replace the old TestLookupFileProject_* and cover the new Expired() predicate.
  • TestFileHandler_ExpiredReturnsGone — DB row in the past, object still in bucket; expects 410 and asserts the body doesn't leak the object content.
  • TestFileHandler_ExpiredOverridesCDNRedirect — same with CDNBaseURL configured; asserts 410 and empty Location header (the expired check must beat the CDN redirect).
  • TestCDNFileHandler_ExpiredReturnsGone — exercises the same refusal on /_cdn/{fn} via the router.

Why 410 over 404: the file did exist at this URL and is being intentionally refused. It also reads correctly if the CDN ever logs origin status codes.

🤖 Generated with Claude Code

acoshift and others added 7 commits May 24, 2026 09:32
Before this change, GET /files/{fn} for a file whose expires_at had
passed would still redirect to the CDN (or stream directly) until the
next GC run dropped the object — billing the project for the egress
and, worse, potentially priming the CDN cache with bytes the user is
no longer entitled to.

Extend the per-fn DB lookup to read expires_at alongside project_id
(both columns are immutable once written, so the existing 60s cache
TTL is still safe) and refuse expired files with 410 Gone in both
fileHandler and cdnFileHandler. The refusal runs *before* the
download/egress metric increments and before the 307, so expired
requests don't bill the project and the CDN never sees a redirect
that could refresh its cache.

cdnFileHandler gets the same check so an attacker who knows the
crypto-random fn can't bypass via the unauthenticated /_cdn/{fn}
origin endpoint.

Files in the bucket without a DB row (insert-failed uploads) and
files with a NULL expires_at are treated as non-expired — we have no
basis to refuse them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous ordering — Bucket.Attributes then lookupFile — meant that
a flood of requests against a single expired fn cached the DB row fine
(60s TTL on an immutable row protects Postgres) but still hit GCS on
every request, because the bucket call ran first. That's one billed
Class B operation per request and direct exposure to GCS read quotas.

Move the lookupFile/Expired check ahead of Bucket.Attributes in both
fileHandler and cdnFileHandler. After the first request hydrates the
cache, subsequent requests for the same expired fn return 410 entirely
from memory and never touch GCS.

For non-expired files this just reorders two calls we'd make anyway —
no change in latency. For files with no DB row (insert-failed uploads)
lookupFile returns Found=false and we fall through to the bucket
exactly as before.

Added two tests that prove the bucket is bypassed by inserting an
expired DB row against an *empty* bucket: a regression would surface
as 404 (bucket-NotFound) instead of 410.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e cache

Two attack vectors the previous code didn't cover:

1) Random-garbage fns. Each request was a unique cache key, so the
   per-fn DB cache never helped. Every request burned one Postgres
   query + one GCS Class B op. An attacker can generate millions of
   unique garbage fns per second per box.

2) Same-valid-but-dead fn (e.g. a URL the attacker held after expiry
   and GC). The negative DB cache amortized Postgres after request 1,
   but Bucket.Attributes still ran on every subsequent request.

Two-layer fix in both fileHandler and cdnFileHandler:

- isValidFilename(fn): generateFilename produces exactly 86 chars of
  URL-safe base64 (no padding). Anything else can't possibly exist in
  the system, so reject in CPU before any I/O. Kills vector (1)
  entirely — a flood of garbage fns becomes pure ServeHTTP overhead.

- BucketMissing flag on fileMeta: after a confirmed Bucket.Attributes
  NotFound for a fn with no DB row, write the negative back into the
  per-fn cache. Subsequent requests within the 60s TTL return 404 from
  memory without re-hitting GCS. Fixes vector (2).

Also extracted fileCacheKey(fn) so lookupFile and the new cache write
share one source of truth.

Test changes:
- New TestIsValidFilename links the validator to generateFilename so
  the two can't drift, plus alphabet/length boundary cases.
- New TestFileHandler_InvalidFilenameRejected / CDN variant: garbage
  fns must 404 from the validator without touching DB or GCS.
- New TestFileHandler_BucketMissNegativeCached / CDN variant: request
  a missing fn, then write the object out-of-band into the bucket; a
  second request must still 404, proving Bucket.Attributes was
  bypassed by the cached negative.
- validTestFn(suffix) helper pads descriptive test names out to 86
  chars so existing happy-path tests still pass the validator while
  remaining readable on failure.
- TestFileHandler_CDNRedirect body-size threshold replaced with a
  direct "body must not contain the object bytes" check, since the
  default http.Redirect body now embeds an 86-char URL.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The fileMeta cache has a 60s TTL and is loaded lazily on miss. Under
load, the moment an entry expires (or at cold start), every concurrent
request for the same fn sees a cache miss before any of them finishes
writing — each then issues its own Postgres SELECT. With CDN cache TTL
and an in-cluster origin call pattern, that "herd at the edge" is a
very real shape against /files/{fn} and /_cdn/{fn}.

Wrap the DB query inside lookupFile with github.com/moonrhythm/sf.Do
(a generic, context-aware singleflight). Concurrent callers for the
same fn join one in-flight execution and share its result; only the
first caller's SELECT actually hits Postgres. The closure re-checks
the cache so a sibling caller that won the race a tick earlier also
benefits.

sf preserves context values via WithoutCancel, so pgctx still resolves
the DB connection inside the closure, and per-caller cancellation
returns ctx.Err() to that caller without killing the shared work the
remaining waiters depend on.

Test: TestLookupFile_SingleflightCollapsesConcurrentCalls fires 50
goroutines at a single cold-cache fn behind a release barrier and
asserts the per-fn DB-query counter stays below N. Without sf the
counter would equal N; with sf it's effectively 1. Per-fn counter
(sync.Map of *atomic.Uint64) so the test stays t.Parallel-safe.
Stable over 20 -race iterations.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three changes that share a single goal: an attacker who hits /files/
with random garbage shouldn't be able to bill the project or burn DB /
GCS quota.

1) sf around the checkAuth external call
   Mirrors the lookupFile change in the previous commit. cacheTTL is
   only 30s, so under upload load from one caller (parallel CI jobs
   holding the same bearer) the cache-miss edge would otherwise
   thunder the /me.authorized endpoint. sf.Do collapses concurrent
   identical (auth, project, projectId) triples to one round-trip,
   with the same re-check-cache-inside-the-closure pattern.

2) New filename scheme: short, alphanumeric, signed
   generateFilename was 64 random bytes → 86-char URL-safe base64
   (alphabet [A-Za-z0-9_-]). The base64 special chars never bought
   anything and the length was generous to a fault. Replaced with:
     fn  = 24 chars from [0-9A-Za-z], rejection-sampled to be
           unbiased (~143 bits of entropy, well over any guessing
           or birthday-collision threshold).
     sig = 20 chars of lowercase hex (HMAC-SHA256 truncated to
           80 bits) keyed by App.SignKey.
   The URL token is `fn + sig` = 44 chars total — down from 86, and
   strictly alphanumeric. fn is what we store in the bucket and the
   files.fn column; the full token only ever appears in URLs.

3) HMAC verification before any backend hit
   fileHandler / cdnFileHandler now run parseToken first. It checks
   length + constant-time HMAC; on any mismatch we 404 in CPU. This
   is the *primary* DDoS shield — random /files/{garbage} floods by
   anyone who doesn't have sign_key never reach lookupFile, never
   reach Bucket.Attributes. The negative bucket cache and per-fn
   metadata cache still matter for the residual case of a replayed
   real URL after expiry+GC.
   isValidFilename is gone; parseToken subsumes its job (length +
   alphabet implicitly enforced by HMAC matching) with stronger
   guarantees.

Config: new required env var sign_key (HMAC key). Rotating it
invalidates every outstanding URL — the correct behavior if a key
ever leaks. Tests use a fixed testSignKey threaded through
newTestApp; updated existing tests to put signed tokens (not raw
fns) in URLs and SetPathValue("token"). Added direct tests for
generateFilename alphabet+uniformity, signFilename
determinism+key-sensitivity, parseToken round-trip and forgery
rejection, plus an sf-collapses-thundering-herd test for checkAuth
that mirrors the lookupFile one.

CDN redirect now preserves the full signed token so the CDN's
origin-fetch hits /_cdn/{token} with a URL the origin can
re-verify. Updated TestFileHandler_CDNRedirect accordingly.

README and CLAUDE.md updated to describe the new token format,
sign_key requirement, and the four-tier DDoS protection ladder
(parseToken → cache → sf.Do → GCS).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… URLs

The previous token layout — fn (fnLen=24 chars) concatenated directly
with sig (sigLen=20 chars) — relied on a fixed cut point at index
fnLen for parseToken. That means any future bump or trim of fnLen
would 404 every outstanding URL: parseToken would slice the old token
at the wrong position and compute HMAC over the wrong fn.

Insert "-" between them. fn is [0-9A-Za-z] and sig is hex, so neither
side can ever contain "-" — the separator is unambiguous. parseToken
now splits on strings.IndexByte('-') instead of a positional slice, so
the verifier no longer cares what fnLen was when a given token was
issued: as long as sig is what signFilename(key, fn) produces today,
the token validates.

tokenLen is gone as a single constant (token length is now
fn-length-dependent). Tests that asserted positional layout were
rewritten to assert structural properties:

- TestParseToken_RoundTripsGeneratedToken now checks for the
  separator's presence and uses fnLen+1+sigLen as the *current*
  expected length, not a hard invariant.
- New TestParseToken_AcceptsDifferentFnLengths feeds fns of 1, 3, 24,
  and 64 chars through makeToken/parseToken to lock in the property
  that motivated this change.
- TestParseToken_RejectsForgeries and TestFileHandler_InvalidTokenRejected
  swapped their length-based bad cases for structural ones: missing
  separator, empty fn, empty sig — a 44-char string with no "-" now
  fails for the right reason.

CLAUDE.md and README updated to mention the "-" and the
backward-compatibility property it buys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Before this change /_cdn relied on streamFile to copy attrs.CacheControl
("public, max-age=86400") onto the response. That has two problems for
the edge:

1) The bucket's 24h max-age caps CDN cache at one day even for files
   whose TTL is 7 days — every cached object re-fetches origin six
   extra times before expiry.
2) Error responses (410, 404) carried no Cache-Control at all, so a
   shared-then-expired URL kept hitting origin on every probe.

cdnFileHandler now sets Cache-Control explicitly per outcome:

- success (200): public, max-age={remaining TTL}, immutable. fn is
  unique per upload so the body genuinely never changes; capping
  max-age at expires_at means the edge stops serving past
  end-of-life instead of holding the bytes for its own cache TTL.
  Falls back to attrs.CacheControl if expires_at is unknown (the
  rare insert-failed-upload case).
- expired (410) and bucket-missing (404): public, max-age=3600. The
  answer is permanent; let the edge absorb repeat probes.
- invalid token (404): no Cache-Control. Each garbage token is a
  unique URL, so caching wouldn't reduce origin load and would just
  fill edge slots with attacker traffic.

streamFile now only falls back to attrs.CacheControl when the caller
hasn't already chosen one — that lets cdnFileHandler pre-set the
TTL-aligned policy without streamFile clobbering it. fileHandler is
unchanged: it never sets Cache-Control beforehand, so streamFile still
uses the bucket's value (correct for the direct-stream / browser
case).

Tests:
- TestCDNFileHandler_SuccessCacheControlUsesRemainingTTL inserts a
  7-day-TTL row alongside the bucket's 24h CacheControl and asserts
  the response's max-age is ~7 days, not 86400 — locks in the
  override.
- TestCDNFileHandler_ExpiredCacheControl asserts 410 carries
  public, max-age=3600.
- TestCDNFileHandler_NotFound extended to assert the same on the
  bucket-NotFound 404 path.
- TestCDNFileHandler_InvalidTokenNoCacheControl asserts garbage-token
  404s leave Cache-Control empty so attacker traffic can't pollute
  edge cache.
- TestCDNFileHandler_Streams (existing) still passes — it has no DB
  row, so meta.ExpiresAt is zero and we correctly fall through to
  attrs.CacheControl.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@acoshift acoshift merged commit e44d1a3 into main May 24, 2026
2 checks passed
@acoshift acoshift deleted the claude/strange-mclean-0b1f86 branch May 24, 2026 04:17
acoshift added a commit that referenced this pull request May 24, 2026
## Summary

- Store the full signed download token (`fn-{hmac}`) in a new
`files.token` column at upload time, so the api's `dropbox.List` can
rebuild working download URLs without holding `sign_key`.

## Why

PR #17 made the HMAC signature mandatory: `GET /files/{token}` runs
`parseToken` first and 404s anything without a valid tag. But
`dropbox.List` (in apiserver) builds its `downloadUrl` from the bare
`fn` — it has no `sign_key`, so every link it returns is a dead
`/files/{fn}` that fails the HMAC check. Persisting the token dropbox
already computes is the simplest fix; the alternative (sharing
`sign_key` with the api) couples key rotation across services.

## How

- `schema/03_token.sql` + `schema.sql`: add `token text not null default
''`. The default keeps the migration safe for existing rows — they carry
an empty token until they self-expire (≤7-day max TTL), which is no
worse than today (links are already broken).
- `uploadHandler` computes the token once, writes it to the new column,
and reuses it for the response `downloadUrl` (previously recomputed
inline).

## Schema migration

```sql
alter table files add column token text not null default '';
```

(dropbox's `schema/*.sql` migrator is not run at startup — apply
out-of-band, same as other columns.)

## Deploy ordering

1. Apply the migration above.
2. Deploy **dropbox** (this PR) — new uploads start populating `token`.
3. Deploy **apiserver** ([deploys-app/apiserver companion
PR](https://github.com/deploys-app/apiserver)) — reads `token` for
`dropbox.List`.

## Test plan

- [x] `go test ./...` passes
- `TestUpload_Success` now asserts the token in `downloadUrl` is
persisted verbatim to `files.token`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant