Phoebe: add resource_id to rated_usage rollup grain (billing customer attribution, E2)#22
Phoebe: add resource_id to rated_usage rollup grain (billing customer attribution, E2)#22hhuuggoo wants to merge 1 commit into
Conversation
…ibution) The rater dropped resource_id, but billing needs it to identify the customer org (E2: bill the deployment's org via resource_id→org_id). resource_id is already on billing_event (nullable); plumb it through the rater into the rated_usage grain. New grain / unique key: (auth_id, resource_id, model_id, window_start). model_id stays — price resolution is per-model and one deployment can serve multiple models/adapters at different rates, so collapsing models would sum traffic priced differently. Two deployments of the same model by the same auth in one hour now correctly produce two rows (they may bill to different orgs). FAIL-CLOSED ATTRIBUTION: resource_id is NULLABLE on billing_event but the new key column is NON-NULL. A row that can't name its deployment/org CANNOT be billed. The grouped/priced filter requires resource_id IS NOT NULL, and the unattributable partition counts resource_id IS NULL — so a NULL-resource_id row is surfaced (exits nonzero), never silently $0-billed or billed to a NULL org. The partition invariant holds: events_rated + unpriced + unattributable + ambiguous == total in-window events. Touch-points: ev/resolved/grouped/priced CTEs, the md5 surrogate-key natural key (length-prefixed, fixed order), ON CONFLICT target + ORDER BY (lock order), the reconcile `deleted` CTE anti-join, both migrations (0002_rating.sql + alembic, edited in place — unapplied, no prod data), a new (resource_id, window_start) index for E2 per-deployment reads, doc comments, and the oracle + SQL-shape + integration + e2e tests. New negative tests: DistinctDeploymentsBillSeparately and NullResourceIdIsUnattributable (Go oracle + live-PG), plus a resource_id assertion in the e2e rollup read. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
🔋 Battery — Round 1 (status: ESCALATE, but clean)9 raw → 6 refuted → 3 confirmed (all low, all test files) + 1 persona. Zero production findings — the resource_id grain + fail-closed attribution partition are solid; the verifiers refuted the scary candidates. The two escalations are minor judgment calls, both with clear answers I'm applying (neither is a contract door): 1. Drop the speculative 2. Document the empty-string-vs-NULL invariant at the oracle (comment only). The oracle uses Go Plus two low-sev test cleanups (store_test.go assertion tidies). The core change is correct: the partition invariant Battery |
|
Merged to main via the squashed resource_id grain change (e854bba). |
The rater currently DROPS
resource_id— but billing needs it to identify the customer org (E2: bill the deployment's org viaresource_id→org_id).resource_idis already onbilling_event(plumbed from the proxy through the drainer); this change carries it through the rater into therated_usagegrain. No prod data exists (rater has no CronJob, nothing deployed), so the existing0002migration revision is edited in place, not stacked.Contracts
rated_usagegrain / unique key — changes from(auth_id, model_id, window_start)to(auth_id, resource_id, model_id, window_start). The constraint is renamedrated_usage_auth_resource_model_window_uq.model_idis KEPT (not redundant withresource_id): price resolution is per-model, one deployment can serve multiple models/adapters at different rates, and collapsing models would sum traffic priced at different rates. Two deployments of the same model by the same auth in one hour → two rows (correct — they may bill to different orgs). TheON CONFLICTtarget, theORDER BY(deadlock-free lock order), the deterministic md5 surrogate-key natural key (length-prefixed, fixed field orderauth_id|resource_id|model_id|epoch), and the reconciledeletedCTE anti-join all move to the new key in lockstep.New index —
rated_usage_resource_id_window_start_ixon(resource_id, window_start). E2 readsrated_usageby deployment over a time window (resolve the org, then sum that deployment's cost); aresource_id-leading index makes that a tight slice rather than a scan over theauth-leading index whereresource_idonly trails.NULL-
resource_id→ unattributable (fail closed) —resource_idis NULLABLE onbilling_eventbut the new key column is NON-NULL. A row that can't name its deployment/org CANNOT be billed. Thegrouped/pricedfilter requiresresource_id IS NOT NULL; the unattributable partition countsresource_id IS NULL(alongsideauth_id/model_id); the unpriced count requires full attribution so a NULL-resource_idunpriced row is counted ONLY as unattributable. Net: a NULL-resource_idrow is COUNTED as unattributable (surfaced, exits nonzero), never silently $0-billed or billed to a NULL org. The partition invariant still holds exactly:events_rated + unpriced + unattributable + ambiguous == total in-window events.Tests
oracleStore) grain + md5 mirror the new key; SQL-shape fragments updated.TestRater_DistinctDeploymentsBillSeparately— two deployments, same auth+model+hour → two rows (Go oracle); live-PG twinTestIntegration_ResourceIDGrainAndFailClosed.TestRater_NullResourceIdIsUnattributable+ extendedTestRater_UnattributableCountedNotSilent— NULLresource_idcounted, never billed; partition holds.resource_id; e2e asserts the rollup carriesX-Saturn-Resource-Idend-to-end.Gate (all green)
go build ./...,go vet ./...+go vet -tags=integration ./...,go test -race ./..., golangci-lint v1.64.8 (plain +--build-tags=integration),gofmt -l .empty, and the full live-Postgres integration + e2e suite (PHOEBE_TEST_DATABASE_URL=…onpostgres:16).Note for Hugo
The grain decision (KEEP
model_id, ADDresource_id) is flagged for your awareness: Ben endorsed it; there is no prod data; it is the correct grain for E2. Two same-model-same-auth deployments in one hour now bill as two rows by design.