Skip to content

BUG: aggregate() pools messages with mixed decodeSource windows, biasing session TPS #66

Description

@devinoldenburg

Summary

aggregate() in plugins/tps/tps.js pools all completed messages by summing decodeMs and decodeGenerated/decodeOutput without reading the decodeSource field. When messages have different decode windows, the pooled session average mixes incompatible denominators and becomes an uninterpretable hybrid number.

Root cause — aggregate() at tps.js:166–227

The function sums decodeMs (line 197) and decodeGenerated/decodeOutput (lines 191–192) from all completed messages indiscriminately, never examining the decodeSource field emitted by each per-message stat (tps.js:147).

The three decodeSource windows have incompatible semantics

messageStats() can compute TPS using three different denominators:

Source decodeMs denominator decodeGenerated numerator
"active" excludes tool-wait/idle gaps excludes primeTokens
"first-token" includes tool-wait gaps includes all generated
"end-to-end" includes everything (prefill + tools) includes all generated

Pooling across them produces (Σ gen_i − Σ prime_i + Σ gen_j + Σ gen_k) / (Σ activeMs_i + Σ ftMs_j + Σ e2eMs_k) — a Frankenstein number that represents no single meaningful quantity.

When mixing actually occurs

tps-meter.tsx:80–85timingFor(m.id) returns undefined for any message that never had a GenerationTimer. Timers are only created on stream events (tps-meter.tsx:111). Messages that completed before the plugin was loaded have no timer → they fall through to "end-to-end" source.

Non-instrumented messages have inflated denominators (tool waits, prefill time included) and very slightly inflated numerators (primeTokens not excluded). The pooled session average is always biased downward vs. the true active-generation TPS.

Concrete example — ~15% underestimate

3 messages (1 pre-plugin e2e + 2 active, each 500 gen output):

gen primeTokens decodeMs true TPS
Msg A (e2e) 500 0 8000 (incl. prefill+tools) 62.5
Msg B (active) 500 20 5000 (active only) 96.0
Msg C (active) 500 20 5000 (active only) 96.0

Pooled result: (500+480+480) / (8000+5000+5000) = 81.1 tok/s — a ~15% underestimate vs. the true active-gen average of 96 tok/s.

When it matters

  • Zero impact in the typical case: plugin loaded at OpenCode startup, captures every message stream → all have "active" source → no mixing.
  • Real impact in "hot-load" scenarios: plugin enabled mid-session → pre-existing messages join the pool with "end-to-end" denominators (2–5× larger than active time when tool calls exist), dragging the session average down.

The decodeSource field is emitted but dead

tps.js:147 emits decodeSource on every stat object. aggregate() never reads it. The view layer never surfaces it. It is used only internally in messageStats to decide prime-correction at tps.js:120–122. This field exists specifically to enable the kind of window-separation that aggregate() does not perform.

Suggested fix

At minimum:

  1. Separate aggregates by window type: return separate pooled values per decodeSource, or
  2. Only pool "active" messages when any exist, falling back to the current behavior only when no active timing is available, or
  3. Warn/log when mixing is detected.

Related

  • The GenerationTimer code (gen.js) deliberately excludes tool waits from TPS — this is the core value proposition. Having the session average silently mix them back in undermines that precision promise.
  • No existing issue covers this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions