Skip to content

fix(rest): re-attach to drain backlog before emitting terminal result#627

Draft
G4614 wants to merge 1 commit into
boxlite-ai:mainfrom
G4614:fix/rest-attach-drain-backlog
Draft

fix(rest): re-attach to drain backlog before emitting terminal result#627
G4614 wants to merge 1 commit into
boxlite-ai:mainfrom
G4614:fix/rest-attach-drain-backlog

Conversation

@G4614
Copy link
Copy Markdown
Contributor

@G4614 G4614 commented May 29, 2026

Re-attach once to drain server-buffered stdout when the attach WebSocket is cut before the exit frame, fixing silent stdout loss.

Test plan

Two-sided (re-attach disabled vs applied), toggled on the branch since the test ships with the fix:

  • ws_terminal_probe_after_cut_must_not_drop_buffered_stdout — in-process mock-WS: the first attach is cut before the exit frame, GET /executions/{id} reports completed/exit 0, and the backlog (hello\n) is served only on re-attach.
observed pre-fix (re-attach disabled) post-fix
consumer stdout (fast cmd, late attach) "" — clean exit 0, output dropped "hello\n" — backlog drained
50×4 fast-command repro (Go SDK, cloud) 18–26/50 loss 0/50
test result FAIL (left:"" right:"hello\n") PASS

Orthogonal to #563 (an sdks/c FFI drain fix at the SDK layer): this is the REST client attach path, and the loss reproduced on both main and the #563 branch. Tradeoff: the re-attach replays the runner's full backlog without an offset, so a partial prefix delivered before the cut can be duplicated — this matches existing StillRunning/Unavailable reconnect semantics and is strictly preferable to silent loss (offset-based replay is out of scope).

A fast command over the cloud (Go runner) could return exit 0 with empty
stdout. When the WS attach drops before the runner flushes output (a proxy /
high-latency cut), attach_ws_pump probed GET /executions/{id}, saw
"completed", and emitted the exit code WITHOUT re-attaching — dropping the
stdout still sitting in the runner's replay backlog. Local `boxlite serve`
masked it (low latency, the exit frame always arrives in-band).

On a disconnect-then-Terminal probe, re-attach once (immediately, no backoff)
so the runner replays its backlog and we leave via its authoritative exit
frame. Bounded to a single attempt with fallback to the probed exit code, so
a runner that never sends a closing exit frame can't hang the pump.

Independent of boxlite-ai#563 (an sdks/c FFI drain fix): this is the REST client path,
so it reproduced on both main and boxlite-ai#563.

Reproducer: ws_terminal_probe_after_cut_must_not_drop_buffered_stdout drives
the WS pump against a mock that cuts the first attach, reports completed/0,
and serves the backlog only on re-attach. Without the fix it observes empty
stdout with a clean exit; with the fix it drains "hello\n".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant