fix: ccusage parity for Claude loader, tiered pricing & store by urbanlama · Pull Request #13 · offbyone1/tokenbbq

urbanlama · 2026-05-17T18:37:36Z

Ziel

Token- und Kostenzahlen mit ccusage v18.0.8 in Übereinstimmung bringen. Alle 6 Findings aus dem Code-Review (CCUSAGE_PARITY_REVIEW.md) behoben, je ein Commit, gegen die vendored ccusage-Quelle verifiziert.

Änderungen

#	Fix	Datei	ccusage-Beleg
1	`reasoning` aus Token-Total entfernt (Codex zählt es bereits in `output`)	`types.ts`, `dashboard.ts`	`apps/codex/src/data-loader.ts`
2	Gestaffeltes >200k-Pricing (`calculateTieredCost`, pro Token-Typ, pro Event)	`pricing.ts`	`packages/internal/src/pricing.ts:284-334`
3	Finite-Number-Guard im Loader + Store (NaN/Infinity vergiften kein Total mehr)	`claude.ts`, `store.ts`	valibot `v.number()`
4	Zero-Token-Drop entfernt — Cache-only- & 0/0-Events bleiben (wie ccusage)	`claude.ts`	`data-loader.ts` usageDataSchema
5	`dedupeKey=null` bei fehlender msg/req-ID → nie dedupliziert; synthetischer Fallback entfernt; Loader-`CACHE_VERSION` 1→2	`claude.ts`, `cache.ts`	`data-loader.ts:521-531,492-506`
6	Store-vs-ccusage-Divergenz dokumentiert + Härtungs-Regressionstest	`store.ts`	—

Verifikation

npm run lint grün, npm run test grün (84/84).
Beide kritischen Parity-Behauptungen unabhängig gegen ./ccusage/ (Ground Truth) geprüft:
- Tiered Pricing wird in ccusage pro Token-Typ unabhängig angewandt (200k=flat, 200001=split) — Port ist Zeile für Zeile identisch.
- Optional-Feld null lässt ccusage via v.safeParse den ganzen Eintrag verwerfen — exakt abgebildet.

Bewusst akzeptierte Rest-Divergenzen (dokumentiert in `CCUSAGE_PARITY_REVIEW.md`)

Infinity-Härtung: Loader/Store lehnen nicht-finite Zahlen ab; bares valibot v.number() akzeptiert Infinity. TokenBBQ-Verhalten ist das korrektere (Total wird nicht vergiftet), kann in wohlgeformtem JSONL ohnehin nicht auftreten.
auto-Mode costUSD: 0: Bei Logs mit explizitem costUSD: 0 rechnet TokenBBQ neu, ccusage nimmt 0 (selten).
Store vs. stateless: Nach manuellem Log-Pruning behält TokenBBQ Historie (TokenBBQ ≥ ccusage) — gewollt.

🤖 Generated with Claude Code

…rity) ccusage never adds reasoning into its total: for Codex the reasoning tokens are already counted inside output_tokens (OpenAI semantics), and ccusage's Codex loader uses the reported total_tokens (=== input+output). totalTokenCount() summed reasoning on top, double-counting every Codex event in the dashboard total, heatmap, topModel and every chart. Drop reasoning from totalTokenCount and every client-side token sum in dashboard.ts. Claude is unaffected (reasoning is always 0 there). The reasoning field stays on TokenCounts and is shown as a non-additive informational row. Adds types.test.ts and a Codex parity invariant in codex.test.ts (totalTokenCount == reported total_tokens for well-formed logs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

calculateCost used flat per-token rates only. ccusage applies LiteLLM's *_above_200k_tokens long-context rates per token type, per event, at the 200k threshold (packages/internal/src/pricing.ts calculateTieredCost). Claude Code turns routinely carry cache_read > 200k, so we materially undercharged long sessions versus ccusage. Port calculateTieredCost faithfully and route all four token types through it. Models without the above_200k fields stay flat — identical to ccusage (it does not implement Gemini's 128k tier either). Extend ModelPricing + the Sonnet 4 FALLBACK entry. Adds boundary tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…guard parseLine coerced via Number(x ?? 0): a non-numeric input_tokens became NaN (?? only catches null/undefined), the 0/0 guard let NaN through, and addTokens propagated it — one bad line turned the whole dashboard total into NaN. ccusage's valibot usageDataSchema instead requires input_tokens/ output_tokens to be real numbers (entry dropped otherwise) and treats the cache fields as v.optional(v.number()) (absent -> 0, present -> must be a number). Add requiredTokenNumber/optionalTokenNumber mirroring that contract (no string coercion; present null/non-number rejects the entry; non-finite rejected as intentional hardening). Harden store.ts isTokenCounts to require finite numbers and use it in loadFile so a poisoned historical line (NaN -> null on disk) is dropped instead of re-poisoning the store. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rity) parseLine dropped any entry with input_tokens === 0 && output_tokens === 0, before reading the cache fields. ccusage's schema accepts 0/0 and calculateTotals still sums cache_creation/cache_read, so a cache-only turn (input=0, output=0, cache_read>0) is real usage. Dropping it undercounted tokens and cost versus ccusage. Remove the guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The dedupe fallback key `timestamp:model:input:output` (used when messageId or requestId was absent) could collapse genuinely distinct ID-less turns — and it ignored cache tokens entirely — making totals lower than ccusage. ccusage's createUniqueHash returns null when either id is missing and isDuplicateEntry(null) === false: ID-less entries are never deduped. Make dedupeKey `string | null`, drop the synthetic fallback, and skip both the seen-check and seen-insert for null keys (== ccusage markAsProcessed(null) noop). Bump the loader CACHE_VERSION to 2 so v1 records carrying stale synthetic keys are reparsed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ccusage is stateless; TokenBBQ persists an append-only store with a content-hash dedup that differs from ccusage's messageId:requestId. Per product decision (harden + document, no risky store-hash migration, no new mode): document the parity invariant on hashEvent — the loader is the dedup authority and matches ccusage; the content hash is only for multi-process safety and is injective for realistically-distinct Claude turns. Add a regression test proving two distinct turns with identical token counts both survive, and record the accepted residuals (post-prune TokenBBQ >= ccusage is intended) in CCUSAGE_PARITY_REVIEW.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

urbanlama · 2026-05-17T22:43:21Z

Review finding:

[P1] reasoning is now excluded globally from token totals via totalTokenCount() in src/types.ts, but that assumption only holds for Codex/OpenAI-shaped usage where reasoning is already included in output.

Gemini and OpenCode currently store reasoning as a separate token bucket:

src/loaders/gemini.ts: reads tokens.thoughts into reasoning and treats it as part of the known total before persisting the event.
src/loaders/opencode.ts: reads tokens.reasoning separately and persists it as reasoning.

With this PR, those sources lose reasoning tokens from dashboard totals, heatmap cells, top model/source ranking, and popups. That fixes Codex double-counting, but creates a cross-source undercount for Gemini/OpenCode.

Suggested direction: make the non-additive reasoning behavior source-specific, or normalize Codex so its reasoning is informational without changing the global TokenCounts total semantics for sources where reasoning is additive.

I verified the branch with:

npm test → 84/84 passing
npm run lint → passing
npm run build → passing

offbyone1 and others added 6 commits May 17, 2026 06:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ccusage parity for Claude loader, tiered pricing & store#13

fix: ccusage parity for Claude loader, tiered pricing & store#13
urbanlama wants to merge 6 commits into
masterfrom
fix/ccusage-parity

urbanlama commented May 17, 2026

Uh oh!

urbanlama commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

urbanlama commented May 17, 2026

Ziel

Änderungen

Verifikation

Bewusst akzeptierte Rest-Divergenzen (dokumentiert in CCUSAGE_PARITY_REVIEW.md)

Uh oh!

urbanlama commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bewusst akzeptierte Rest-Divergenzen (dokumentiert in `CCUSAGE_PARITY_REVIEW.md`)