Skip to content

fix: skip synthetic zero-usage assistant entries in billing#23

Merged
jverre merged 1 commit into
mainfrom
jacques/OPIK-6873-skip-synthetic-calls
Jun 12, 2026
Merged

fix: skip synthetic zero-usage assistant entries in billing#23
jverre merged 1 commit into
mainfrom
jacques/OPIK-6873-skip-synthetic-calls

Conversation

@jverre

@jverre jverre commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Details

Second phantom-token mechanism found after the compaction fix (#22), exposed by the Σ lanes vs usage discrepancy that #22 deliberately left visible.

Claude Code writes locally fabricated assistant entries into the transcript when an API call errors or is interrupted: model: "<synthetic>", isApiErrorMessage: true, and a complete usage object where every field is zero. These were never billed by the API. llmCallsInTurn only checked Usage != nil, so each one was treated as a real LLM call whose "request" (the entire prior conversation) got reconciled against a zero-token measured prompt: estimated pieces scaled to zero and the usage-derived pieces overflowed the positional cut into the fresh-input tier.

Two such entries in a real session produced 665,727 phantom input tokens (lane input summed to 781,745 vs 115,337 actually billed - ~$6.7 of phantom spend at fable-5 input rates).

Fix: skip calls whose usage sums to zero. They bill nothing, so nothing is lost - and this covers any future unbilled-entry variant, not just <synthetic>.

Testing

  • TestBillingSkipsSyntheticZeroUsageCalls: a zero-usage synthetic entry between two real calls is excluded from llm_calls and Σ lanes still equals usage
  • Dry-run on the real session that exposed the bug: all four lane columns now reconcile to API usage token-for-token (input 130,000 / cache_read 328,138,781 / cache_creation 3,004,255 / output 733,462) - including output, whose previous small drift also traced to the synthetic blocks
  • Full suite, gofmt, go vet clean; binaries rebuilt

🤖 Generated with Claude Code

Claude Code writes locally fabricated assistant entries (model
"<synthetic>", isApiErrorMessage: true) with an all-zero usage object
when an API call errors or is interrupted. They were never billed, but
llmCallsInTurn only checked Usage != nil, so each one became a "call"
whose layout reconciled against a zero-token prompt: estimates scaled
to zero and the usage-derived pieces dumped into the fresh-input tier.
Two such entries in a real session produced 665,727 phantom input
tokens (lane input 781,745 vs 115,337 actually billed).

Skip calls whose usage sums to zero — they bill nothing, so nothing is
lost. On the same session all four lane columns now reconcile to API
usage token-for-token, including output (the synthetic blocks were
also the source of a small output drift).
@jverre jverre merged commit 1fe842e into main Jun 12, 2026
1 check passed
@jverre jverre deleted the jacques/OPIK-6873-skip-synthetic-calls branch June 12, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant