Skip to content

fix(mcp): make interactive tool calls transport-drop-safe (supersede, reap, keep-alive)#51

Merged
juacker merged 3 commits into
mainfrom
fix/interactive-tool-transport-drop-guidance
Jun 12, 2026
Merged

fix(mcp): make interactive tool calls transport-drop-safe (supersede, reap, keep-alive)#51
juacker merged 3 commits into
mainfrom
fix/interactive-tool-transport-drop-guidance

Conversation

@juacker

@juacker juacker commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Problem

When the Claude Code CLI drops the local MCP transport mid-call (transport dropped mid-call; response for tool <name> was lost) during an interactive tool call — ask_user, an approval-gated bash_exec, or fs_request_grant — the in-flight wait was orphaned:

  • rmcp's stateful streamable-HTTP server runs call_tool in a detached session worker (spawn_session_worker) that survives the dropped connection, so the abandoned-wait RAII guards (which fire on future drop) never ran, despite their doc comments claiming they handled transport drops.
  • The pending registry entry and its approval/path-grant card lingered for up to 55 minutes (CLI_INTERACTIVE_WAIT_TIMEOUT); the run wasn't stopped.
  • A stale ask_user entry also kept session_has_pending_ask true, suppressing mid-run input delivery.
  • If the model re-asked, the user saw a duplicate card stacked next to a stale, unanswerable one.

Fix (three layers, root-cause first)

  1. Supersede on re-ask — registering a new interactive request takes any stale pending entry for the same key (bash: run + command; path grant: run + path + access; ask_user: session, matching the FE's single-panel model), emits resolved/attention so the stale card is replaced in place the moment the fresh one appears, and wakes the orphaned wait with a channel-closed error that is now treated as superseded (guard disarmed — never cancels the live run that's waiting on the new request).
  2. Run-end reapingBindingGuard now owns a run-scope CancellationToken, cancelled on drop; execute_bound_tool races it. Any tool future still in flight when the run ends (only possible for orphans — a live CLI waits for its tool results) is dropped, which finally fires the existing cleanup guards exactly as they were documented to.
  3. Keep-alive restoredsse_keep_alive is left at rmcp's default Some(15s) instead of None. The None was carried over from the pre-1.7 struct literal, not a decision; an idle unpinged response stream during a long human wait is precisely what rots into a transport drop.

Consequential fix: workspace_delete now explicitly cancels the runs whose pending approvals it purges (purge_workspace returns the awaiting run ids), since a closed approval channel no longer implies run cancellation.

Prompt change (first commit, amended by the second)

The scoped Interactive Tool Reliability section (emitted only when an interactive tool is present) still tells the model a dropped interactive call has an UNKNOWN outcome and to re-issue the same call once — but the "tell the user they can dismiss the stale card" caveat is gone: superseding replaces the card automatically.

Tests

  • BindingGuard drop cancels the run scope + unbinds without touching the run's cancel token; the reap arm drops the racing future (RAII guards fire).
  • take_superseded matches only same run + command / run + path + access; channel-close as supersede signal; counts maintained.
  • ask_user::take_for_session scoping + superseded submit_answer rejection.
  • Prompt tests assert the dismiss caveat is gone.

cargo test --lib: 611 passed. cargo fmt --check, cargo clippy --lib --tests, tsc --noEmit clean.

juacker added 2 commits June 12, 2026 10:47
The local MCP transport can drop an in-flight call (the model sees
"transport dropped mid-call; response for tool <name> was lost"). For
tools that block on a user grant or answer (ask_user, approval-gated
bash_exec / fs_request_grant) the outcome is then unknown and a stale
approval/question card can linger in the UI, so the model must re-issue
the same interactive call rather than assume an answer or proceed.

Add a scoped 'Interactive Tool Reliability' section to the system prompt,
emitted only when such a tool is present, plus tests for presence/absence.
…, keep-alive

A CLI transport drop used to orphan in-flight interactive waits (ask_user,
bash approvals, path grants): the rmcp stateful session worker keeps the
tool future alive past the dropped connection, so the abandoned-wait
guards never fired, pending registry entries (and their UI cards)
lingered until the 55-min interactive timeout, and a re-asked command
stacked a duplicate card next to the stale one.

Three root fixes:

- Supersede on re-ask: registering a bash approval (same run + command),
  path grant (same run + path + access), or ask_user (same session) takes
  the stale pending entry, emits resolved/attention so the old card is
  replaced in place, and wakes the orphaned wait with a channel-closed
  error that it now treats as superseded (disarm, never cancel the live
  run).
- Run-end reaping: BindingGuard owns a run-scope CancellationToken,
  cancelled on drop; execute_bound_tool races it, so orphaned tool
  futures are dropped at run end and their RAII cleanup guards finally
  fire as documented.
- Keep rmcp's default sse_keep_alive (15s) instead of disabling it; idle
  unpinged response streams during long human waits are exactly what
  rotted into transport drops. The None came from the pre-1.7 struct
  literal, not a decision.

Workspace deletion now cancels the runs whose approvals it purges
(purge_workspace returns run ids), since the awaiting side no longer
treats a closed channel as cancel-the-run.

The Interactive Tool Reliability prompt section drops the 'user can
dismiss the stale card' caveat: re-asking now replaces the card
automatically, so the model just re-issues the same call.
@juacker juacker force-pushed the fix/interactive-tool-transport-drop-guidance branch from 39e62ab to e8ec4d3 Compare June 12, 2026 09:03
@juacker juacker changed the title feat(prompt): guide re-asking interactive tools after a transport drop fix(mcp): make interactive tool calls transport-drop-safe (supersede, reap, keep-alive) Jun 12, 2026
@juacker juacker marked this pull request as ready for review June 12, 2026 13:22
@juacker juacker merged commit c11ad8e into main Jun 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant