fix: ccusage parity for Claude loader, tiered pricing & store#13
fix: ccusage parity for Claude loader, tiered pricing & store#13urbanlama wants to merge 6 commits into
Conversation
…rity) ccusage never adds reasoning into its total: for Codex the reasoning tokens are already counted inside output_tokens (OpenAI semantics), and ccusage's Codex loader uses the reported total_tokens (=== input+output). totalTokenCount() summed reasoning on top, double-counting every Codex event in the dashboard total, heatmap, topModel and every chart. Drop reasoning from totalTokenCount and every client-side token sum in dashboard.ts. Claude is unaffected (reasoning is always 0 there). The reasoning field stays on TokenCounts and is shown as a non-additive informational row. Adds types.test.ts and a Codex parity invariant in codex.test.ts (totalTokenCount == reported total_tokens for well-formed logs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
calculateCost used flat per-token rates only. ccusage applies LiteLLM's *_above_200k_tokens long-context rates per token type, per event, at the 200k threshold (packages/internal/src/pricing.ts calculateTieredCost). Claude Code turns routinely carry cache_read > 200k, so we materially undercharged long sessions versus ccusage. Port calculateTieredCost faithfully and route all four token types through it. Models without the above_200k fields stay flat — identical to ccusage (it does not implement Gemini's 128k tier either). Extend ModelPricing + the Sonnet 4 FALLBACK entry. Adds boundary tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…guard parseLine coerced via Number(x ?? 0): a non-numeric input_tokens became NaN (?? only catches null/undefined), the 0/0 guard let NaN through, and addTokens propagated it — one bad line turned the whole dashboard total into NaN. ccusage's valibot usageDataSchema instead requires input_tokens/ output_tokens to be real numbers (entry dropped otherwise) and treats the cache fields as v.optional(v.number()) (absent -> 0, present -> must be a number). Add requiredTokenNumber/optionalTokenNumber mirroring that contract (no string coercion; present null/non-number rejects the entry; non-finite rejected as intentional hardening). Harden store.ts isTokenCounts to require finite numbers and use it in loadFile so a poisoned historical line (NaN -> null on disk) is dropped instead of re-poisoning the store. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rity) parseLine dropped any entry with input_tokens === 0 && output_tokens === 0, before reading the cache fields. ccusage's schema accepts 0/0 and calculateTotals still sums cache_creation/cache_read, so a cache-only turn (input=0, output=0, cache_read>0) is real usage. Dropping it undercounted tokens and cost versus ccusage. Remove the guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dedupe fallback key `timestamp:model:input:output` (used when messageId or requestId was absent) could collapse genuinely distinct ID-less turns — and it ignored cache tokens entirely — making totals lower than ccusage. ccusage's createUniqueHash returns null when either id is missing and isDuplicateEntry(null) === false: ID-less entries are never deduped. Make dedupeKey `string | null`, drop the synthetic fallback, and skip both the seen-check and seen-insert for null keys (== ccusage markAsProcessed(null) noop). Bump the loader CACHE_VERSION to 2 so v1 records carrying stale synthetic keys are reparsed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ccusage is stateless; TokenBBQ persists an append-only store with a content-hash dedup that differs from ccusage's messageId:requestId. Per product decision (harden + document, no risky store-hash migration, no new mode): document the parity invariant on hashEvent — the loader is the dedup authority and matches ccusage; the content hash is only for multi-process safety and is injective for realistically-distinct Claude turns. Add a regression test proving two distinct turns with identical token counts both survive, and record the accepted residuals (post-prune TokenBBQ >= ccusage is intended) in CCUSAGE_PARITY_REVIEW.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Review finding: [P1] Gemini and OpenCode currently store reasoning as a separate token bucket:
With this PR, those sources lose reasoning tokens from dashboard totals, heatmap cells, top model/source ranking, and popups. That fixes Codex double-counting, but creates a cross-source undercount for Gemini/OpenCode. Suggested direction: make the non-additive reasoning behavior source-specific, or normalize Codex so its I verified the branch with:
|
Ziel
Token- und Kostenzahlen mit ccusage v18.0.8 in Übereinstimmung bringen. Alle 6 Findings aus dem Code-Review (
CCUSAGE_PARITY_REVIEW.md) behoben, je ein Commit, gegen die vendored ccusage-Quelle verifiziert.Änderungen
reasoningaus Token-Total entfernt (Codex zählt es bereits inoutput)types.ts,dashboard.tsapps/codex/src/data-loader.tscalculateTieredCost, pro Token-Typ, pro Event)pricing.tspackages/internal/src/pricing.ts:284-334claude.ts,store.tsv.number()claude.tsdata-loader.tsusageDataSchemadedupeKey=nullbei fehlender msg/req-ID → nie dedupliziert; synthetischer Fallback entfernt; Loader-CACHE_VERSION1→2claude.ts,cache.tsdata-loader.ts:521-531,492-506store.tsVerifikation
npm run lintgrün,npm run testgrün (84/84)../ccusage/(Ground Truth) geprüft:nulllässt ccusage viav.safeParseden ganzen Eintrag verwerfen — exakt abgebildet.Bewusst akzeptierte Rest-Divergenzen (dokumentiert in
CCUSAGE_PARITY_REVIEW.md)v.number()akzeptiertInfinity. TokenBBQ-Verhalten ist das korrektere (Total wird nicht vergiftet), kann in wohlgeformtem JSONL ohnehin nicht auftreten.auto-ModecostUSD: 0: Bei Logs mit explizitemcostUSD: 0rechnet TokenBBQ neu, ccusage nimmt 0 (selten).🤖 Generated with Claude Code