Skip to content

fix(rest): re-attach to drain backlog before emitting terminal result#3

Closed
G4614 wants to merge 1 commit into
mainfrom
fix/rest-attach-drain-backlog
Closed

fix(rest): re-attach to drain backlog before emitting terminal result#3
G4614 wants to merge 1 commit into
mainfrom
fix/rest-attach-drain-backlog

Conversation

@G4614
Copy link
Copy Markdown
Owner

@G4614 G4614 commented May 29, 2026

Summary

  • boxlite run <img> <fast-cmd> against the cloud (Go runner) intermittently returned exit 0 with empty stdout for commands that exit almost instantly (e.g. uptime). The command ran inside the box, but its output was silently dropped before reaching the client (~80% over the cloud path; local boxlite serve masked it via low attach latency).
  • Root cause is in the REST client WS attach pump (src/boxlite/src/rest/litebox.rs): when the attach WebSocket drops before an exit frame (a proxy / high-latency cut), attach_ws_pump probes GET /executions/{id}; on a Terminal ("completed") result it emitted the exit code and returned without re-attaching — dropping the stdout still buffered in the runner's replay backlog.
  • Fix: on a disconnect-then-Terminal probe, re-attach once, immediately (no backoff) so the runner replays its backlog and we exit via its authoritative exit frame. Bounded by a terminal_drain_attempted flag, with fallback to the probed exit code so a runner that never sends a closing exit frame can't hang the pump.

Independent of boxlite-ai#563 (an sdks/c FFI drain fix): this is the REST client path, so the bug reproduced on both main and the boxlite-ai#563 branch. They are orthogonal fixes for the same user-visible symptom.

Tradeoff

If the first connection delivered partial stdout before being cut, the re-attach replays the runner's full backlog, so the already-delivered prefix can be duplicated. This matches the existing StillRunning/Unavailable reconnect semantics (the runner's streamBus replays without an offset) and is strictly preferable to silent loss. Eliminating it entirely needs offset-based replay in the attach protocol (out of scope here).

Test plan

  • New regression test ws_terminal_probe_after_cut_must_not_drop_buffered_stdout (reuses the in-process mock-WS harness): cuts the first attach, reports completed/exit 0, serves the backlog only on re-attach.
  • Two-side verified on a main base: fails without the production change (left:"" right:"hello\n", clean exit 0), passes with it.
  • Existing WS attach tests pass: ws_clean_exit_emits_result, ws_close_without_exit_falls_back_to_status, ws_text_error_frame_logs_but_continues.
  • cargo fmt --check + cargo clippy -p boxlite --features rest --lib -D warnings clean on the change.

Note: ws_watchdog_fires_when_idle fails in my local sandbox (the mock server's accept loop is blocked by the test's own 2s sleep, blowing the test's 3s budget). It reproduces on pristine main/boxlite-ai#563 and is unrelated to this change.

🤖 Generated with Claude Code

A fast command over the cloud (Go runner) could return exit 0 with empty
stdout. When the WS attach drops before the runner flushes output (a proxy /
high-latency cut), attach_ws_pump probed GET /executions/{id}, saw
"completed", and emitted the exit code WITHOUT re-attaching — dropping the
stdout still sitting in the runner's replay backlog. Local `boxlite serve`
masked it (low latency, the exit frame always arrives in-band).

On a disconnect-then-Terminal probe, re-attach once (immediately, no backoff)
so the runner replays its backlog and we leave via its authoritative exit
frame. Bounded to a single attempt with fallback to the probed exit code, so
a runner that never sends a closing exit frame can't hang the pump.

Independent of boxlite-ai#563 (an sdks/c FFI drain fix): this is the REST client path,
so it reproduced on both main and boxlite-ai#563.

Reproducer: ws_terminal_probe_after_cut_must_not_drop_buffered_stdout drives
the WS pump against a mock that cuts the first attach, reports completed/0,
and serves the backlog only on re-attach. Without the fix it observes empty
stdout with a clean exit; with the fix it drains "hello\n".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614
Copy link
Copy Markdown
Owner Author

G4614 commented May 29, 2026

Superseded by cross-repo PR boxlite-ai#627 (fork → upstream). This fork-internal PR was opened by mistake.

@G4614 G4614 closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant