Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 21 additions & 7 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,35 +19,49 @@ Standard Go HTTP server (not Cloudflare Workers) serving as a temporary file upl
**Environment variables (required):**
- `db_url` — PostgreSQL connection string
- `bucket_name` — GCS bucket name
- `sign_key` — HMAC key for signing download tokens (`handler.go:signFilename`). Rotating it invalidates every outstanding URL — that's the right behavior if a key leaks.

**Environment variables (optional):**
- `base_url` — download URL prefix (default: `https://dropbox.deploys.app/files/`)
- `api_endpoint` — deploys.app API base URL (default: `https://api.deploys.app`, override with internal address in production)
- `cdn_base_url` — full URL prefix (including scheme and trailing slash, e.g. `https://cdn.example.com/`). When set, `GET /files/{fn}` records the download metric (counted against `attrs.Size`) and 307-redirects to `{cdn_base_url}{fn}`. The CDN edge is expected to fetch its origin at `https://dropbox.deploys.app/_cdn/{fn}`, which streams the file unauthenticated and without metrics. In-cluster callers (private/loopback/link-local `X-Real-Ip`) bypass the redirect and stream directly. Unset = original streaming behavior.
- `cdn_base_url` — full URL prefix (including scheme and trailing slash, e.g. `https://cdn.example.com/`). When set, `GET /files/{token}` records the download metric (counted against `attrs.Size`) and 307-redirects to `{cdn_base_url}{token}`. The CDN edge is expected to fetch its origin at `https://dropbox.deploys.app/_cdn/{token}`, which streams the file (no auth, no metrics) after re-verifying the same HMAC. Each `/_cdn` response sets `Cache-Control` for the edge: success uses `public, max-age={remaining TTL}, immutable` so the edge cache lines up with the file's actual lifetime; 410/404 for known-dead URLs use `public, max-age=3600` so repeat probes are absorbed at the edge; invalid tokens get no `Cache-Control` because each garbage URL is unique and caching just burns edge slots. In-cluster callers (private/loopback/link-local `X-Real-Ip`) bypass the redirect and stream directly. Unset = original streaming behavior.
- `PORT` — listen port (default: `8080`)
- `log_level` — slog level (default: info)

**Download token scheme (`handler.go`):**
- The URL path component is a `token` = `fn` + `"-"` + `sig`, currently 45 chars total. `fn` is 24 random chars `[0-9A-Za-z]` (~143 bits of entropy); `sig` is 20 hex chars of HMAC-SHA256 truncated to 80 bits, keyed by `sign_key`.
- The `-` separator lets us change `fnLen` later without invalidating tokens that are already in circulation — `parseToken` splits structurally on the separator, not by fixed position. Since `fn` is alphanumeric and `sig` is hex, neither side can contain a `-`.
- `parseToken(SignKey, token)` runs first in both `fileHandler` and `cdnFileHandler` and 404s on any mismatch — DDoS attempts that don't know `sign_key` never reach the DB or GCS.
- `fn` is what we store in the bucket and the `files.fn` column. The full token only appears in URLs.

**Request flow (`handler.go`):**
1. Parse `Authorization` header + `project`/`projectId` from query params or `param-*` headers (query params take precedence)
2. Authorize via `checkAuth()` in `auth.go`
3. Parse TTL (1–7 days, default 1) and optional filename the same way
4. Generate a crypto-random 86-char URL-safe base64 filename with TTL digit prepended (e.g., `1ABC…`)
5. Stream body to GCS with cache-control and optional content-disposition
4. Generate a 24-char alphanumeric `fn` (`generateFilename`, rejection-sampled to stay unbiased)
5. Stream body to GCS with cache-control and optional content-disposition, keyed by `fn`
6. Insert metadata into PostgreSQL via `pgctx.Exec`
7. Return JSON: `{"ok": true, "result": {"downloadUrl": "...", "expiresAt": "..."}}`
7. Return JSON: `{"ok": true, "result": {"downloadUrl": "{base_url}{fn}-{sig}", "expiresAt": "..."}}`

**Auth (`auth.go`):**
- No `Authorization` header → alpha mode, project ID hardcoded as `"alpha"` (TODO: remove)
- With token → POST to `https://api.deploys.app/me.authorized` for `dropbox.upload` permission, checking `authorized` + `billingAccount.active`
- Results cached in-process for 30 seconds via `cachestore`
- Results cached in-process for 30 seconds via `cachestore`; the external call is wrapped in `sf.Do` so concurrent uploads from the same caller collapse to one round-trip at the cache-miss edge.

**DDoS protection ladder:** see `fileHandler` / `cdnFileHandler` in `files.go`. In order from cheapest to most expensive:
1. `parseToken` HMAC check — pure CPU, no I/O.
2. `lookupFile` cache — 60s in-process cache of `(project_id, expires_at, bucket_missing)` per fn.
3. `sf.Do` around the DB `SELECT` — collapses any thundering herd at the cache-miss edge.
4. `Bucket.Attributes` — only reached for tokens that survive 1–3.

**Key libraries (same pattern as `moonrhythm/registry`):**
- `parapet` — HTTP server with middleware chain (healthz, logger, pgctx)
- `pgctx` — context-aware PostgreSQL access (`pgctx.Exec`, middleware injects DB into context)
- `cachestore` — in-process TTL cache for auth results
- `cachestore` — in-process TTL cache for auth results and per-fn metadata
- `sf` — generic context-aware singleflight (`github.com/moonrhythm/sf`); used in `lookupFile` and `checkAuth` to dedupe concurrent backend calls
- `configfile` — env-var config reader (`config.MustString`, `config.StringDefault`)

## Notes

- `schema.sql` targets PostgreSQL; `project_id` is `text` (the API returns string IDs)
- `base_url` is the public download prefix (`https://dropbox.deploys.app/files/`); it shares the service host and resolves to the `GET /files/{fn}` route, which streams directly or 307s to the CDN (see `cdn_base_url`)
- `base_url` is the public download prefix (`https://dropbox.deploys.app/files/`); it shares the service host and resolves to the `GET /files/{token}` route, which streams directly or 307s to the CDN (see `cdn_base_url`)
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Docker image is built and pushed automatically on push to `main`. See `.github/w
|---------------|----------------------------------------------------------|
| `db_url` | PostgreSQL connection string |
| `bucket_name` | GCS bucket name |
| `sign_key` | HMAC key for signing download tokens. Rotating invalidates every outstanding URL. |
| `base_url` | Download URL prefix (default: `https://dropbox.deploys.app/files/`) |
| `PORT` | Listen port (default: `8080`) |

Expand Down Expand Up @@ -74,12 +75,14 @@ File data binary
{
"ok": true,
"result": {
"downloadUrl": "https://dropbox.deploys.app/files/<filename>",
"downloadUrl": "https://dropbox.deploys.app/files/<token>",
"expiresAt": "2020-01-01T01:01:01Z"
}
}
```

`<token>` is `{fn}-{sig}` (currently 45 chars): a 24-char random alphanumeric filename, a `-` separator, and a 20-char HMAC-SHA256 signature (keyed by `sign_key`). Tampered or made-up tokens are rejected before any DB or storage lookup. The separator means future changes to filename length stay backward-compatible.

##### Unauthorized

```json
Expand Down
108 changes: 62 additions & 46 deletions auth.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"time"

"github.com/moonrhythm/cachestore"
"github.com/moonrhythm/sf"
)

var apiEndpoint = "https://api.deploys.app"
Expand Down Expand Up @@ -44,57 +45,72 @@ func checkAuth(ctx context.Context, auth, project, projectID string) AuthResult
return v
}

body, _ := json.Marshal(struct {
Project string `json:"project,omitempty"`
ProjectID string `json:"projectId,omitempty"`
Permissions []string `json:"permissions"`
}{
Project: project,
ProjectID: projectID,
Permissions: []string{permission},
})
// Same singleflight pattern as lookupFile: collapse a thundering herd
// of concurrent uploads from the same caller into a single
// /me.authorized round-trip. The result is cached for 30s
// (cacheTTL); sf.Do dedupe matters at the cold-cache edge and right
// after that 30s entry expires under load.
result, _, _ := sf.Do(ctx, cacheKey, func(ctx context.Context) (AuthResult, error) {
// Re-check the cache: a sibling caller may have populated it
// while we were queued behind sf's mutex.
if v, ok := cachestore.Get[AuthResult](cacheKey); ok {
return v, nil
}

req, _ := http.NewRequest(http.MethodPost, apiEndpoint+"/me.authorized", bytes.NewReader(body))
req.Header.Set("Authorization", auth)
req.Header.Set("Content-Type", "application/json")
body, _ := json.Marshal(struct {
Project string `json:"project,omitempty"`
ProjectID string `json:"projectId,omitempty"`
Permissions []string `json:"permissions"`
}{
Project: project,
ProjectID: projectID,
Permissions: []string{permission},
})

resp, err := http.DefaultClient.Do(req)
if err != nil {
return AuthResult{}
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return AuthResult{}
}
req, _ := http.NewRequestWithContext(ctx, http.MethodPost, apiEndpoint+"/me.authorized", bytes.NewReader(body))
req.Header.Set("Authorization", auth)
req.Header.Set("Content-Type", "application/json")

var res struct {
OK bool `json:"ok"`
Result struct {
Authorized bool `json:"authorized"`
Project struct {
ID string `json:"id"`
Project string `json:"project"`
BillingAccount struct {
Active bool `json:"active"`
} `json:"billingAccount"`
} `json:"project"`
} `json:"result"`
}
if err := json.NewDecoder(resp.Body).Decode(&res); err != nil {
return AuthResult{}
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
// Don't cache transport failures — let the next caller retry.
return AuthResult{}, nil
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return AuthResult{}, nil
}

var result AuthResult
if res.OK && res.Result.Authorized && res.Result.Project.BillingAccount.Active {
result = AuthResult{
Authorized: true,
Project: Project{
ID: res.Result.Project.ID,
Project: res.Result.Project.Project,
},
var res struct {
OK bool `json:"ok"`
Result struct {
Authorized bool `json:"authorized"`
Project struct {
ID string `json:"id"`
Project string `json:"project"`
BillingAccount struct {
Active bool `json:"active"`
} `json:"billingAccount"`
} `json:"project"`
} `json:"result"`
}
if err := json.NewDecoder(resp.Body).Decode(&res); err != nil {
return AuthResult{}, nil
}

var result AuthResult
if res.OK && res.Result.Authorized && res.Result.Project.BillingAccount.Active {
result = AuthResult{
Authorized: true,
Project: Project{
ID: res.Result.Project.ID,
Project: res.Result.Project.Project,
},
}
}
}

cachestore.Set(cacheKey, result, &cachestore.SetOptions{TTL: cacheTTL})
cachestore.Set(cacheKey, result, &cachestore.SetOptions{TTL: cacheTTL})
return result, nil
})
return result
}
44 changes: 44 additions & 0 deletions auth_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"sync"
"sync/atomic"
"testing"
"time"
)

// A single mock server is started lazily and serves every auth test. Each test
Expand Down Expand Up @@ -215,6 +216,49 @@ func TestCheckAuth_CachesResult(t *testing.T) {
}
}

func TestCheckAuth_SingleflightCollapsesConcurrentCalls(t *testing.T) {
// Same shape as TestLookupFile_SingleflightCollapsesConcurrentCalls:
// 50 goroutines race on a cold-cache (auth, project, projectId)
// triple. sf.Do must collapse them into a single /me.authorized
// round-trip — otherwise a thundering herd of uploads from one
// caller (e.g. parallel CI jobs holding the same bearer) hammers
// the deploys.app API on every cache-miss edge.
t.Parallel()
var calls atomic.Int64
token := "Bearer " + t.Name()
registerAuthMock(t, token, func(w http.ResponseWriter, r *http.Request) {
calls.Add(1)
// A short sleep widens the singleflight window so the test
// reliably catches a regression where dedupe is broken.
time.Sleep(50 * time.Millisecond)
jsonAuthMock(true, true)(w, r)
})

const N = 50
results := make([]AuthResult, N)
var wg sync.WaitGroup
start := make(chan struct{})
for i := 0; i < N; i++ {
wg.Add(1)
go func(idx int) {
defer wg.Done()
<-start
results[idx] = checkAuth(context.Background(), token, "sfproject", "")
}(i)
}
close(start)
wg.Wait()

for i, r := range results {
if !r.Authorized {
t.Errorf("results[%d] not authorized", i)
}
}
if got := calls.Load(); got >= N {
t.Errorf("auth API called %d times, want <%d (sf should have collapsed the herd)", got, N)
}
}

func TestCheckAuth_CacheKeyDistinguishesTokens(t *testing.T) {
t.Parallel()
var calls atomic.Int64
Expand Down
Loading