fix(chat): free the injected-turn slot when an injected turn is interrupted#50
Merged
Merged
Conversation
…rupted A second queued message delivered while the FIRST injected message's turn was still running interrupts that turn — but the interrupted wind-down result skipped the pending_injected_turns decrement unconditionally (the rule is only correct when the cut turn is the original prompt turn, which never held a slot). The counter ended one too high, the run kept stdin open waiting for a turn that was never coming, and never reached EOF: a zombie run with an eternal spinner that also swallowed every later user message into itself (observed 2026-06-11, run 6b218444). Replace the raw counter with InjectedTurnLedger: exactly one result ends each turn and turns run strictly in order, so the first result always belongs to the original turn (no slot) and every later result — including suppressed interrupt wind-downs — frees one injected-turn slot. This also fixes the symmetric edge where a message delivered before the original turn streamed its first event had its slot consumed by the original turn's own result. Adds unit tests covering the double-interrupt regression, the single interrupt, the pre-first-event delivery, and the no-injection baseline.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Since mid-run queued-message delivery (#8ab0b33
feat(chat): deliver mid-run user messages into live Claude runs), a run can turn into a zombie: the spinner never stops after the conversation has visibly ended, and subsequent user messages get delivered into the dead run instead of starting a new one, so the agent appears to go silent.Observed 2026-06-11 (run
6b218444): two queued messages delivered mid-run, both withinterrupted=true— the second one interrupting the first injected message's turn. After the final turn completed, the run kept stdin open forever, never reached EOF, and never finalized. A message sent 15 minutes later was routed into the zombie run (interrupted=false, Claude sitting idle).Root cause
With
--input-format stream-json, the run ends when we close stdin (Claude drains and exits → EOF → run finalizes). That decision was driven by a counter,pending_injected_turns:+1per delivered message,-1perresultevent — except when the result was the wind-down of a turn we interrupted, on the assumption that an interrupted wind-down doesn't complete a real turn.That assumption only holds when the interrupted turn is the original prompt turn (which never held a slot). When the interrupted turn is itself a previously injected turn — a second queued message arriving while the first one's turn runs — the wind-down is that turn completing. Skipping the decrement leaks one slot permanently: the counter never reaches 0, stdin never closes, the run never ends.
The deeper issue: the accounting was keyed off
suppress_interrupted, an error-classification flag whose actual job is to stop the interrupt'serror_during_executionfrom failing the run. Error semantics and turn counting are different concerns.Fix
Replace the conditional decrement with a protocol invariant that needs no classification:
So: the first
resultalways ends the original turn (frees nothing); every laterresultends an injected turn and frees exactly one slot — whether it finished naturally or was cut by our interrupt.Encoded as
InjectedTurnLedger, a pure ~20-line state machine (delivered(n)/turn_ended()/pending()), unit-tested directly.suppress_interruptednow does only error suppression; accounting no longer looks at it.This also fixes a symmetric latent edge for free: a message delivered before the original turn streamed its first event (poll tick fires before the first stream event, no interrupt sent) used to have its slot consumed by the original turn's own result, closing stdin while its turn was still buffered.
Tests
injected_turn_ledger_double_interrupt_frees_the_cut_turns_slot— replays the observed log sequence 1:1 (regression).injected_turn_ledger_single_interrupt— the already-working case stays correct.injected_turn_ledger_delivery_before_first_turn_event— the latent edge above.injected_turn_ledger_no_injections— baseline.cargo checkclean; all 19assistant::local_agenttests pass.