fix: skip synthetic zero-usage assistant entries in billing#23
Merged
Conversation
Claude Code writes locally fabricated assistant entries (model "<synthetic>", isApiErrorMessage: true) with an all-zero usage object when an API call errors or is interrupted. They were never billed, but llmCallsInTurn only checked Usage != nil, so each one became a "call" whose layout reconciled against a zero-token prompt: estimates scaled to zero and the usage-derived pieces dumped into the fresh-input tier. Two such entries in a real session produced 665,727 phantom input tokens (lane input 781,745 vs 115,337 actually billed). Skip calls whose usage sums to zero — they bill nothing, so nothing is lost. On the same session all four lane columns now reconcile to API usage token-for-token, including output (the synthetic blocks were also the source of a small output drift).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Details
Second phantom-token mechanism found after the compaction fix (#22), exposed by the Σ lanes vs usage discrepancy that #22 deliberately left visible.
Claude Code writes locally fabricated assistant entries into the transcript when an API call errors or is interrupted:
model: "<synthetic>",isApiErrorMessage: true, and a complete usage object where every field is zero. These were never billed by the API.llmCallsInTurnonly checkedUsage != nil, so each one was treated as a real LLM call whose "request" (the entire prior conversation) got reconciled against a zero-token measured prompt: estimated pieces scaled to zero and the usage-derived pieces overflowed the positional cut into the fresh-input tier.Two such entries in a real session produced 665,727 phantom input tokens (lane
inputsummed to 781,745 vs 115,337 actually billed - ~$6.7 of phantom spend at fable-5 input rates).Fix: skip calls whose usage sums to zero. They bill nothing, so nothing is lost - and this covers any future unbilled-entry variant, not just
<synthetic>.Testing
TestBillingSkipsSyntheticZeroUsageCalls: a zero-usage synthetic entry between two real calls is excluded fromllm_callsand Σ lanes still equals usageoutput, whose previous small drift also traced to the synthetic blocksgofmt,go vetclean; binaries rebuilt🤖 Generated with Claude Code