feat(usage): track JSONL tokens + cost, gated by api_key auth, with subagent rollup#127
Open
manzil-infinity180 wants to merge 8 commits into
Open
feat(usage): track JSONL tokens + cost, gated by api_key auth, with subagent rollup#127manzil-infinity180 wants to merge 8 commits into
manzil-infinity180 wants to merge 8 commits into
Conversation
c367e03 to
3c07fe0
Compare
Paper §5 invariant #2 said sub-agent spend counts toward parent totals, but rollup only fired at SubagentStop. A child running concurrently with its parent would let the parent's fail-fast pass at $9 while the child spent another $5 — parent ceiling $10 was breached the moment SubagentStop finally merged. Parent's PostToolUse now enumerates child sessions (state.Manager.ListChildSessions), sums any whose IDs aren't already in parent.ChildSessionIDs, and feeds the effective metrics into CheckLimits. Race window narrows to one child-tool-call delta, which the issue explicitly accepts.
There was a problem hiding this comment.
Pull request overview
This PR adds transcript-driven token/cache/cost accounting from Claude Code JSONL logs, gates cost-limit enforcement based on detected Claude authentication mode, and implements parent-side aggregation of in-flight subagent spend to narrow the fail-fast overspend window (issue #135).
Changes:
- Introduce
internal/usageJSONL scanning + model-family pricing with env overrides, and plumb the resulting metrics into hook limit checks. - Add
internal/auth/claudeauthauth-mode detection and downgrade cost-based limits (e.g.,maxSpendUSD) to advisory under non-api_keysessions across hooks/replay/verify. - Add parent
PostToolUsesubagent rollup viaManager.ListChildSessions+rollupUnmergedChildrenfor near-real-time accumulation.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/aflock/types.go | Extends SessionState and SessionMetrics to include auth mode + cache token breakdown + cost provenance flags. |
| internal/verify/verifier.go | Treats cost-based limit exceedance as advisory under non-api_key auth during verification. |
| internal/usage/testdata/sample.jsonl | Sample transcript fixture for JSONL parser tests. |
| internal/usage/pricing.go | Model-family pricing table + env overrides + cost computation. |
| internal/usage/pricing_test.go | Unit tests for pricing lookup, overrides, and unknown-model tracking. |
| internal/usage/jsonl.go | JSONL transcript scanner with (message.id, requestId) dedup and per-model rollups. |
| internal/usage/jsonl_test.go | Unit tests for JSONL dedup, filtering, and missing-file behavior. |
| internal/state/session.go | Adds Manager.ListChildSessions for parent-side rollup enumeration. |
| internal/state/session_test.go | Tests child-session enumeration correctness. |
| internal/replay/simulate.go | Aligns replay behavior with advisory cost limits under non-api_key auth. |
| internal/policy/evaluator.go | Adds Evaluator.IsAdvisoryLimit(limitName, authMode) helper. |
| internal/policy/evaluator_authmode_test.go | Tests advisory/enforced matrix for limits vs auth modes. |
| internal/mcp/server.go | Exposes new metrics fields + authMode in session API output. |
| internal/hooks/usage_refresh.go | Adds transcript refresh to populate token/cache/cost metrics in hook state. |
| internal/hooks/subagent_rollup_test.go | Tests rollup correctness, no double-count, and no parent mutation. |
| internal/hooks/handler.go | Wires auth-mode capture, transcript refresh, advisory gating, and in-flight subagent rollup into hooks. |
| internal/auth/claudeauth/detect.go | Implements auth-mode detection (env/api-key/credentials/keychain). |
| internal/auth/claudeauth/detect_test.go | Tests auth-mode detection precedence and file/env probes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
Author
|
test it with the anthropic api key method :) |
4 tasks
Signed-off-by: Rahul Vishwakarma <rahulvs2809@gmail.com>
Signed-off-by: Rahul Vishwakarma <rahulvs2809@gmail.com>
Signed-off-by: Rahul Vishwakarma <rahulvs2809@gmail.com>
3 tasks
…i-key # Conflicts: # go.mod # internal/hooks/handler.go
Signed-off-by: Rahul Vishwakarma <rahulvs2809@gmail.com>
Signed-off-by: Rahul Vishwakarma <rahulvs2809@gmail.com>
8654ff8 to
335b510
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two related fixes shipped together so the api_key-gated cost tracking and the parent/subagent spend rollup can be tested in one POC.
What's new
internal/usage— JSONL parser with(message.id, requestId)dedup and pricing table for opus/sonnet/haiku families (AFLOCK_PRICE_*overrides).internal/auth/claudeauth— auth-mode probe (env var →ANTHROPIC_API_KEY→~/.claude/.credentials.json→ macOS keychain).SessionState.AuthMode+ richerSessionMetrics(cache 5m/1h,usageSource,costMeasured).CheckLimits.maxSpendUSD) activates only underapi_key; advisory under subscription/unknown viaEvaluator.IsAdvisoryLimit. Token/turn limits enforce under both modes (transcript counts match Anthropic's per-turn accounting either way).state.Manager.ListChildSessions(parentID), sums any whose IDs aren't already inparent.ChildSessionIDs, and feeds effective metrics into fail-fast — closes the window where concurrent children could blow past the parent ceiling between SubagentStop merges. Race narrows to one child-tool-call delta (per paper §5 invariant #2 acceptance).Test plan
go test ./...cleanmaxSpendUSD→ blockmaxSpendUSD→ stderr advisory onlymaxTokensInenforces under both modesmaxSpendUSD=\$0.50→ parent's next PostToolUse blocks, stderr showsSubagent rollup: including N in-flight child(ren)