Skip to content

BUG: messageStats incorrectly applies prime-token correction when output and reasoning tokens mix in prime chunks #14

Description

@devinoldenburg

Summary

messageStats() applies a single aggregate primeTokens count against both output and generated token numerators, but does not track whether prime tokens are reasoning or output tokens. When the first streamed chunk is reasoning (typical for reasoning models like Claude Extended Thinking or Gemini Flash Thinking), primeTokens is deducted from the output numerator using only Math.min(primeTokens, output) as a cap — yielding decodeOutput = 0 even when all actual output tokens were decoded during the active window.

Root Cause

File: plugins/tps/tps.js:120-122

const onActive = decodeSource === "active";
const decodeGenerated = onActive ? Math.max(0, generated - primeTokens) : generated;
const decodeOutput = onActive ? Math.max(0, output - Math.min(primeTokens, output)) : output;

primeTokens is a single number — the sum of all "prime" chunk tokens regardless of type (reasoning or text). decodeGenerated correctly subtracts ALL prime tokens from the combined generated total. But decodeOutput applies the same subtraction against output only, capped at output's value, without knowing how many of the prime tokens were actually output vs. reasoning.

Concrete Reproduction

Consider a reasoning model streaming this sequence:

Chunk Type Chars Tokens (ratio 4) Prime?
1 reasoning 80 20 yes (first chunk)
2 text/output 40 10 no
3 text/output 40 10 no

At completion: output = 20, reasoning = 20, generated = 40, primeTokens = 20

decodeOutput = Math.max(0, 20 - Math.min(20, 20)) = 0  // WRONG — all 20 output tokens were actively decoded

The correct decodeOutput should be 20 (all output tokens arrived during the active-decode window, not during prefill).

Why This Is Wrong

The first chunk (reasoning, 20 tokens) was decoded during prefill — it is the "prime" chunk. Those 20 reasoning tokens should be excluded from the generated numerator (correct: 40 - 20 = 20 active-generated tokens). But they should NOT be excluded from the output numerator because they were reasoning, not output.

The current code has no way to distinguish the two. It treats all prime tokens as if they are of the same type as whatever metric is being computed.

Impact

  • Output TPS metric (metric: "output") is systematically undercounted for reasoning models where the first chunk is reasoning.
  • Any stream where reasoning and output tokens are interleaved with tool calls will produce incorrect outputTps values after a resume gap (the resume chunk is prime, and if it happens to be reasoning, it pollutes the output numerator).

Possible Fix

The GenerationTimer needs to track prime tokens per type. Minimally, timingFor in tps-meter.tsx could pass separate primeOutputTokens and primeReasoningTokens, and messageStats would subtract only the matching type:

const decodeOutput = onActive ? Math.max(0, output - (timing.primeOutputTokens ?? 0)) : output;
const decodeGenerated = onActive ? Math.max(0, generated - (timing.primeGeneratedTokens ?? primeTokens)) : generated;

Steps to Reproduce

  1. Configure a reasoning model (e.g., extended thinking mode) in OpenCode.
  2. Send a prompt that generates both reasoning and output tokens.
  3. Observe that outputTps is 0 (or "–") in the sidebar, even though output tokens were generated.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions