Summary
aggregate() in plugins/tps/tps.js pools all completed messages by summing decodeMs and decodeGenerated/decodeOutput without reading the decodeSource field. When messages have different decode windows, the pooled session average mixes incompatible denominators and becomes an uninterpretable hybrid number.
Root cause — aggregate() at tps.js:166–227
The function sums decodeMs (line 197) and decodeGenerated/decodeOutput (lines 191–192) from all completed messages indiscriminately, never examining the decodeSource field emitted by each per-message stat (tps.js:147).
The three decodeSource windows have incompatible semantics
messageStats() can compute TPS using three different denominators:
| Source |
decodeMs denominator |
decodeGenerated numerator |
"active" |
excludes tool-wait/idle gaps |
excludes primeTokens |
"first-token" |
includes tool-wait gaps |
includes all generated |
"end-to-end" |
includes everything (prefill + tools) |
includes all generated |
Pooling across them produces (Σ gen_i − Σ prime_i + Σ gen_j + Σ gen_k) / (Σ activeMs_i + Σ ftMs_j + Σ e2eMs_k) — a Frankenstein number that represents no single meaningful quantity.
When mixing actually occurs
tps-meter.tsx:80–85 — timingFor(m.id) returns undefined for any message that never had a GenerationTimer. Timers are only created on stream events (tps-meter.tsx:111). Messages that completed before the plugin was loaded have no timer → they fall through to "end-to-end" source.
Non-instrumented messages have inflated denominators (tool waits, prefill time included) and very slightly inflated numerators (primeTokens not excluded). The pooled session average is always biased downward vs. the true active-generation TPS.
Concrete example — ~15% underestimate
3 messages (1 pre-plugin e2e + 2 active, each 500 gen output):
|
gen |
primeTokens |
decodeMs |
true TPS |
| Msg A (e2e) |
500 |
0 |
8000 (incl. prefill+tools) |
62.5 |
| Msg B (active) |
500 |
20 |
5000 (active only) |
96.0 |
| Msg C (active) |
500 |
20 |
5000 (active only) |
96.0 |
Pooled result: (500+480+480) / (8000+5000+5000) = 81.1 tok/s — a ~15% underestimate vs. the true active-gen average of 96 tok/s.
When it matters
- Zero impact in the typical case: plugin loaded at OpenCode startup, captures every message stream → all have
"active" source → no mixing.
- Real impact in "hot-load" scenarios: plugin enabled mid-session → pre-existing messages join the pool with
"end-to-end" denominators (2–5× larger than active time when tool calls exist), dragging the session average down.
The decodeSource field is emitted but dead
tps.js:147 emits decodeSource on every stat object. aggregate() never reads it. The view layer never surfaces it. It is used only internally in messageStats to decide prime-correction at tps.js:120–122. This field exists specifically to enable the kind of window-separation that aggregate() does not perform.
Suggested fix
At minimum:
- Separate aggregates by window type: return separate pooled values per
decodeSource, or
- Only pool
"active" messages when any exist, falling back to the current behavior only when no active timing is available, or
- Warn/log when mixing is detected.
Related
- The GenerationTimer code (
gen.js) deliberately excludes tool waits from TPS — this is the core value proposition. Having the session average silently mix them back in undermines that precision promise.
- No existing issue covers this.
Summary
aggregate()inplugins/tps/tps.jspools all completed messages by summingdecodeMsanddecodeGenerated/decodeOutputwithout reading thedecodeSourcefield. When messages have different decode windows, the pooled session average mixes incompatible denominators and becomes an uninterpretable hybrid number.Root cause —
aggregate()attps.js:166–227The function sums
decodeMs(line 197) anddecodeGenerated/decodeOutput(lines 191–192) from all completed messages indiscriminately, never examining thedecodeSourcefield emitted by each per-message stat (tps.js:147).The three decodeSource windows have incompatible semantics
messageStats()can compute TPS using three different denominators:decodeMsdenominatordecodeGeneratednumerator"active""first-token""end-to-end"Pooling across them produces
(Σ gen_i − Σ prime_i + Σ gen_j + Σ gen_k) / (Σ activeMs_i + Σ ftMs_j + Σ e2eMs_k)— a Frankenstein number that represents no single meaningful quantity.When mixing actually occurs
tps-meter.tsx:80–85—timingFor(m.id)returnsundefinedfor any message that never had a GenerationTimer. Timers are only created on stream events (tps-meter.tsx:111). Messages that completed before the plugin was loaded have no timer → they fall through to"end-to-end"source.Non-instrumented messages have inflated denominators (tool waits, prefill time included) and very slightly inflated numerators (primeTokens not excluded). The pooled session average is always biased downward vs. the true active-generation TPS.
Concrete example — ~15% underestimate
3 messages (1 pre-plugin e2e + 2 active, each 500 gen output):
Pooled result:
(500+480+480) / (8000+5000+5000)= 81.1 tok/s — a ~15% underestimate vs. the true active-gen average of 96 tok/s.When it matters
"active"source → no mixing."end-to-end"denominators (2–5× larger than active time when tool calls exist), dragging the session average down.The
decodeSourcefield is emitted but deadtps.js:147emitsdecodeSourceon every stat object.aggregate()never reads it. The view layer never surfaces it. It is used only internally inmessageStatsto decide prime-correction attps.js:120–122. This field exists specifically to enable the kind of window-separation thataggregate()does not perform.Suggested fix
At minimum:
decodeSource, or"active"messages when any exist, falling back to the current behavior only when no active timing is available, orRelated
gen.js) deliberately excludes tool waits from TPS — this is the core value proposition. Having the session average silently mix them back in undermines that precision promise.