Summary
messageStats() applies a single aggregate primeTokens count against both output and generated token numerators, but does not track whether prime tokens are reasoning or output tokens. When the first streamed chunk is reasoning (typical for reasoning models like Claude Extended Thinking or Gemini Flash Thinking), primeTokens is deducted from the output numerator using only Math.min(primeTokens, output) as a cap — yielding decodeOutput = 0 even when all actual output tokens were decoded during the active window.
Root Cause
File: plugins/tps/tps.js:120-122
const onActive = decodeSource === "active";
const decodeGenerated = onActive ? Math.max(0, generated - primeTokens) : generated;
const decodeOutput = onActive ? Math.max(0, output - Math.min(primeTokens, output)) : output;
primeTokens is a single number — the sum of all "prime" chunk tokens regardless of type (reasoning or text). decodeGenerated correctly subtracts ALL prime tokens from the combined generated total. But decodeOutput applies the same subtraction against output only, capped at output's value, without knowing how many of the prime tokens were actually output vs. reasoning.
Concrete Reproduction
Consider a reasoning model streaming this sequence:
| Chunk |
Type |
Chars |
Tokens (ratio 4) |
Prime? |
| 1 |
reasoning |
80 |
20 |
yes (first chunk) |
| 2 |
text/output |
40 |
10 |
no |
| 3 |
text/output |
40 |
10 |
no |
At completion: output = 20, reasoning = 20, generated = 40, primeTokens = 20
decodeOutput = Math.max(0, 20 - Math.min(20, 20)) = 0 // WRONG — all 20 output tokens were actively decoded
The correct decodeOutput should be 20 (all output tokens arrived during the active-decode window, not during prefill).
Why This Is Wrong
The first chunk (reasoning, 20 tokens) was decoded during prefill — it is the "prime" chunk. Those 20 reasoning tokens should be excluded from the generated numerator (correct: 40 - 20 = 20 active-generated tokens). But they should NOT be excluded from the output numerator because they were reasoning, not output.
The current code has no way to distinguish the two. It treats all prime tokens as if they are of the same type as whatever metric is being computed.
Impact
- Output TPS metric (
metric: "output") is systematically undercounted for reasoning models where the first chunk is reasoning.
- Any stream where reasoning and output tokens are interleaved with tool calls will produce incorrect
outputTps values after a resume gap (the resume chunk is prime, and if it happens to be reasoning, it pollutes the output numerator).
Possible Fix
The GenerationTimer needs to track prime tokens per type. Minimally, timingFor in tps-meter.tsx could pass separate primeOutputTokens and primeReasoningTokens, and messageStats would subtract only the matching type:
const decodeOutput = onActive ? Math.max(0, output - (timing.primeOutputTokens ?? 0)) : output;
const decodeGenerated = onActive ? Math.max(0, generated - (timing.primeGeneratedTokens ?? primeTokens)) : generated;
Steps to Reproduce
- Configure a reasoning model (e.g., extended thinking mode) in OpenCode.
- Send a prompt that generates both reasoning and output tokens.
- Observe that
outputTps is 0 (or "–") in the sidebar, even though output tokens were generated.
Related
Summary
messageStats()applies a single aggregateprimeTokenscount against bothoutputandgeneratedtoken numerators, but does not track whether prime tokens are reasoning or output tokens. When the first streamed chunk is reasoning (typical for reasoning models like Claude Extended Thinking or Gemini Flash Thinking),primeTokensis deducted from the output numerator using onlyMath.min(primeTokens, output)as a cap — yieldingdecodeOutput = 0even when all actual output tokens were decoded during the active window.Root Cause
File:
plugins/tps/tps.js:120-122primeTokensis a single number — the sum of all "prime" chunk tokens regardless of type (reasoning or text).decodeGeneratedcorrectly subtracts ALL prime tokens from the combinedgeneratedtotal. ButdecodeOutputapplies the same subtraction againstoutputonly, capped at output's value, without knowing how many of the prime tokens were actually output vs. reasoning.Concrete Reproduction
Consider a reasoning model streaming this sequence:
At completion:
output = 20,reasoning = 20,generated = 40,primeTokens = 20The correct
decodeOutputshould be 20 (all output tokens arrived during the active-decode window, not during prefill).Why This Is Wrong
The first chunk (reasoning, 20 tokens) was decoded during prefill — it is the "prime" chunk. Those 20 reasoning tokens should be excluded from the generated numerator (correct:
40 - 20 = 20active-generated tokens). But they should NOT be excluded from the output numerator because they were reasoning, not output.The current code has no way to distinguish the two. It treats all prime tokens as if they are of the same type as whatever metric is being computed.
Impact
metric: "output") is systematically undercounted for reasoning models where the first chunk is reasoning.outputTpsvalues after a resume gap (the resume chunk is prime, and if it happens to be reasoning, it pollutes the output numerator).Possible Fix
The
GenerationTimerneeds to track prime tokens per type. Minimally,timingForintps-meter.tsxcould pass separateprimeOutputTokensandprimeReasoningTokens, andmessageStatswould subtract only the matching type:Steps to Reproduce
outputTpsis 0 (or "–") in the sidebar, even though output tokens were generated.Related