Skip to content

fix: ccusage parity for Claude loader, tiered pricing & store#13

Open
urbanlama wants to merge 6 commits into
masterfrom
fix/ccusage-parity
Open

fix: ccusage parity for Claude loader, tiered pricing & store#13
urbanlama wants to merge 6 commits into
masterfrom
fix/ccusage-parity

Conversation

@urbanlama
Copy link
Copy Markdown
Collaborator

Ziel

Token- und Kostenzahlen mit ccusage v18.0.8 in Übereinstimmung bringen. Alle 6 Findings aus dem Code-Review (CCUSAGE_PARITY_REVIEW.md) behoben, je ein Commit, gegen die vendored ccusage-Quelle verifiziert.

Änderungen

# Fix Datei ccusage-Beleg
1 reasoning aus Token-Total entfernt (Codex zählt es bereits in output) types.ts, dashboard.ts apps/codex/src/data-loader.ts
2 Gestaffeltes >200k-Pricing (calculateTieredCost, pro Token-Typ, pro Event) pricing.ts packages/internal/src/pricing.ts:284-334
3 Finite-Number-Guard im Loader + Store (NaN/Infinity vergiften kein Total mehr) claude.ts, store.ts valibot v.number()
4 Zero-Token-Drop entfernt — Cache-only- & 0/0-Events bleiben (wie ccusage) claude.ts data-loader.ts usageDataSchema
5 dedupeKey=null bei fehlender msg/req-ID → nie dedupliziert; synthetischer Fallback entfernt; Loader-CACHE_VERSION 1→2 claude.ts, cache.ts data-loader.ts:521-531,492-506
6 Store-vs-ccusage-Divergenz dokumentiert + Härtungs-Regressionstest store.ts

Verifikation

  • npm run lint grün, npm run test grün (84/84).
  • Beide kritischen Parity-Behauptungen unabhängig gegen ./ccusage/ (Ground Truth) geprüft:
    • Tiered Pricing wird in ccusage pro Token-Typ unabhängig angewandt (200k=flat, 200001=split) — Port ist Zeile für Zeile identisch.
    • Optional-Feld null lässt ccusage via v.safeParse den ganzen Eintrag verwerfen — exakt abgebildet.

Bewusst akzeptierte Rest-Divergenzen (dokumentiert in CCUSAGE_PARITY_REVIEW.md)

  • Infinity-Härtung: Loader/Store lehnen nicht-finite Zahlen ab; bares valibot v.number() akzeptiert Infinity. TokenBBQ-Verhalten ist das korrektere (Total wird nicht vergiftet), kann in wohlgeformtem JSONL ohnehin nicht auftreten.
  • auto-Mode costUSD: 0: Bei Logs mit explizitem costUSD: 0 rechnet TokenBBQ neu, ccusage nimmt 0 (selten).
  • Store vs. stateless: Nach manuellem Log-Pruning behält TokenBBQ Historie (TokenBBQ ≥ ccusage) — gewollt.

🤖 Generated with Claude Code

offbyone1 and others added 6 commits May 17, 2026 06:48
…rity)

ccusage never adds reasoning into its total: for Codex the reasoning
tokens are already counted inside output_tokens (OpenAI semantics), and
ccusage's Codex loader uses the reported total_tokens (=== input+output).
totalTokenCount() summed reasoning on top, double-counting every Codex
event in the dashboard total, heatmap, topModel and every chart.

Drop reasoning from totalTokenCount and every client-side token sum in
dashboard.ts. Claude is unaffected (reasoning is always 0 there). The
reasoning field stays on TokenCounts and is shown as a non-additive
informational row. Adds types.test.ts and a Codex parity invariant in
codex.test.ts (totalTokenCount == reported total_tokens for well-formed
logs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
calculateCost used flat per-token rates only. ccusage applies LiteLLM's
*_above_200k_tokens long-context rates per token type, per event, at the
200k threshold (packages/internal/src/pricing.ts calculateTieredCost).
Claude Code turns routinely carry cache_read > 200k, so we materially
undercharged long sessions versus ccusage.

Port calculateTieredCost faithfully and route all four token types
through it. Models without the above_200k fields stay flat — identical
to ccusage (it does not implement Gemini's 128k tier either). Extend
ModelPricing + the Sonnet 4 FALLBACK entry. Adds boundary tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…guard

parseLine coerced via Number(x ?? 0): a non-numeric input_tokens became
NaN (?? only catches null/undefined), the 0/0 guard let NaN through, and
addTokens propagated it — one bad line turned the whole dashboard total
into NaN. ccusage's valibot usageDataSchema instead requires input_tokens/
output_tokens to be real numbers (entry dropped otherwise) and treats the
cache fields as v.optional(v.number()) (absent -> 0, present -> must be a
number).

Add requiredTokenNumber/optionalTokenNumber mirroring that contract (no
string coercion; present null/non-number rejects the entry; non-finite
rejected as intentional hardening). Harden store.ts isTokenCounts to
require finite numbers and use it in loadFile so a poisoned historical
line (NaN -> null on disk) is dropped instead of re-poisoning the store.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rity)

parseLine dropped any entry with input_tokens === 0 && output_tokens ===
0, before reading the cache fields. ccusage's schema accepts 0/0 and
calculateTotals still sums cache_creation/cache_read, so a cache-only
turn (input=0, output=0, cache_read>0) is real usage. Dropping it
undercounted tokens and cost versus ccusage. Remove the guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dedupe fallback key `timestamp:model:input:output` (used when
messageId or requestId was absent) could collapse genuinely distinct
ID-less turns — and it ignored cache tokens entirely — making totals
lower than ccusage. ccusage's createUniqueHash returns null when either
id is missing and isDuplicateEntry(null) === false: ID-less entries are
never deduped.

Make dedupeKey `string | null`, drop the synthetic fallback, and skip
both the seen-check and seen-insert for null keys (== ccusage
markAsProcessed(null) noop). Bump the loader CACHE_VERSION to 2 so v1
records carrying stale synthetic keys are reparsed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ccusage is stateless; TokenBBQ persists an append-only store with a
content-hash dedup that differs from ccusage's messageId:requestId. Per
product decision (harden + document, no risky store-hash migration, no
new mode): document the parity invariant on hashEvent — the loader is the
dedup authority and matches ccusage; the content hash is only for
multi-process safety and is injective for realistically-distinct Claude
turns. Add a regression test proving two distinct turns with identical
token counts both survive, and record the accepted residuals (post-prune
TokenBBQ >= ccusage is intended) in CCUSAGE_PARITY_REVIEW.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@urbanlama
Copy link
Copy Markdown
Collaborator Author

Review finding:

[P1] reasoning is now excluded globally from token totals via totalTokenCount() in src/types.ts, but that assumption only holds for Codex/OpenAI-shaped usage where reasoning is already included in output.

Gemini and OpenCode currently store reasoning as a separate token bucket:

  • src/loaders/gemini.ts: reads tokens.thoughts into reasoning and treats it as part of the known total before persisting the event.
  • src/loaders/opencode.ts: reads tokens.reasoning separately and persists it as reasoning.

With this PR, those sources lose reasoning tokens from dashboard totals, heatmap cells, top model/source ranking, and popups. That fixes Codex double-counting, but creates a cross-source undercount for Gemini/OpenCode.

Suggested direction: make the non-additive reasoning behavior source-specific, or normalize Codex so its reasoning is informational without changing the global TokenCounts total semantics for sources where reasoning is additive.

I verified the branch with:

  • npm test → 84/84 passing
  • npm run lint → passing
  • npm run build → passing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants