Skip to content

fix(gateway): record partial-stream usage on client disconnect#66

Merged
Menci merged 2 commits into
mainfrom
precursor-streaming-improvements
Jun 21, 2026
Merged

fix(gateway): record partial-stream usage on client disconnect#66
Menci merged 2 commits into
mainfrom
precursor-streaming-improvements

Conversation

@Menci

@Menci Menci commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Summary

tokenUsageFromMessagesFrame previously returned the running snapshot only on message_stop, so the per-stream usage state never advanced past the initial message_start. On client disconnect mid-stream the recorded telemetry under-counted output tokens that Anthropic actually billed. Fix: snapshot on every revising frame so disconnect-time recording reflects actual partial consumption.

Bumps the gateway vitest default testTimeout from 10s → 30s in the same PR — control-plane route tests using setupAppTest (full Hono app + memory D1 + admin session per test) flake under workspace-parallel load; real test work is hundreds of milliseconds, and 30s gives headroom without masking actual hangs.

Test plan

  • typecheck + test + lint clean

Menci added 2 commits June 20, 2026 00:44
…flake

The control-plane route tests use `setupAppTest` to spin up the full
Hono app + memory D1 mock + admin session per test. Real work is
hundreds of milliseconds, but under workspace-parallel load (workers
contending for CPU, GC pauses) several of them push past vitest's 10s
default and flake intermittently. 30s gives headroom without masking
actual hangs.
…-stream

`tokenUsageFromMessagesFrame` only returned the running usage figure on
`message_stop`, leaving the per-frame observer's `state.rememberUsage`
calls a no-op for the in-flight `message_start` and `message_delta`
frames. When the downstream client aborted mid-stream — every Ctrl-C in
the Claude Code CLI — the streaming finally block then recorded a null
or stale usage, so billing telemetry under-counted every aborted session
even though Anthropic already metered the output tokens against the
operator's plan window.

Return the running snapshot on `message_start` and `message_delta` too
so `state.usage` checkpoints as the stream progresses; the finally block
already records whatever was last observed when the client disconnects.

Audited against sub2api (drains upstream to capture final usage even on
client disconnect) — different remedy, same intent: keep recorded usage
aligned with what the upstream actually metered.
@Menci Menci force-pushed the precursor-streaming-improvements branch from b2a3fe1 to cce32e2 Compare June 19, 2026 18:14
@Menci Menci changed the title feat(gateway): 60s SSE idle timeout + partial-stream billing fix fix(gateway): record partial-stream usage on client disconnect Jun 19, 2026
@Menci Menci force-pushed the precursor-streaming-improvements branch from cce32e2 to 8ff3e0b Compare June 19, 2026 19:05
@Menci Menci force-pushed the precursor-response-surface branch from 5f83545 to 1f19c40 Compare June 19, 2026 19:11
@Menci Menci marked this pull request as draft June 19, 2026 19:12
@Menci Menci force-pushed the precursor-response-surface branch 4 times, most recently from c629fa7 to dc06eca Compare June 20, 2026 18:13
@Menci Menci force-pushed the precursor-streaming-improvements branch from 4a12894 to c459978 Compare June 20, 2026 19:51
@Menci Menci changed the base branch from precursor-response-surface to main June 21, 2026 18:57
@Menci Menci force-pushed the precursor-streaming-improvements branch from c459978 to 8eb6db6 Compare June 21, 2026 18:57
@Menci Menci marked this pull request as ready for review June 21, 2026 18:59
@Menci Menci merged commit cc1941c into main Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant