Skip to content

Session resume permanently broken: tool_use/tool_result mismatch from interleaved sub-agent turns #2323

@PureWeen

Description

@PureWeen

Bug Description

After resuming a session that used multi-agent (sub-agent) parallel execution, every prompt fails with:

CAPIError: 400 messages.9: `tool_use` ids were found without `tool_result` blocks immediately after: tooluse_KnoMQcLpsPRnvfH4stPa5s.
Each `tool_use` block must have a corresponding `tool_result` block in the next message.

The session is permanently unrecoverable — the error persists across retries, resumes, and even CLI version upgrades (tested 1.0.10 → 1.0.12-1).

Root Cause Analysis

The conversation reconstruction from events.jsonl on resume cannot handle tool_results that complete after a sub-agent turn has interleaved. The events are recorded faithfully, but the reconstructed messages[] array violates Claude's tool_use/tool_result adjacency requirement.

Event sequence from events.jsonl (lines 1947–1967):

  1. Line 1947: Main agent (Claude, tooluse_* IDs) emits assistant.message with 3 parallel tool_use requests
  2. Line 1953: 1st tool completes (tooluse_jUS9...)
  3. Line 1954: A sub-agent turn begins — new assistant.message with call_* IDs (OpenAI format), turnId=1
  4. Line 1960: 2nd tool from main agent completes (tooluse_EBk...)
  5. Line 1961: Sub-agent tools complete
  6. Line 1963: assistant.turn_end turnId=1 (sub-agent done)
  7. Line 1966: 3rd tool from main agent completes (tooluse_KnoMQcL...) — arrives after sub-agent turn
  8. Line 1967: assistant.turn_end turnId=0 (main done)

On resume, the conversation builder places the sub-agent's assistant message (line 1954) between the main agent's tool_use (line 1947) and the tool_result for tooluse_KnoMQcL... (line 1966). This violates the API requirement that every tool_use must have a tool_result in the immediately next message.

Impact

  • Session is permanently broken — no recovery path exists
  • User loses all conversation context (this was a long session with 2700+ events)
  • Error is deterministic: always fails at messages.9
  • Affects any session that had parallel tool calls where a sub-agent turn interleaved before all tool_results arrived

Reproduction

  • Session ID: 0c90fe17-4e2d-4bf6-8126-bcf97750e74c
  • CLI versions tested: darwin-arm64/1.0.10, universal/1.0.12-1
  • 4+ failed attempts across 3 separate resume cycles

Suggested Fixes

  1. Conversation builder: Group all tool_results with their originating tool_use message regardless of interleaved sub-agent turns
  2. Graceful degradation: Detect orphaned tool_use blocks and inject synthetic tool_results (e.g., "[tool result unavailable after session resume]") rather than sending a malformed conversation
  3. Error recovery: Detect this specific 400 error pattern and auto-truncate the conversation before the corrupted point

Contact

Please reach out to @PureWeen (Shane Neuville) for the full events.jsonl file for debugging. He has the complete session state available.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:agentsSub-agents, fleet, autopilot, plan mode, background agents, and custom agentsarea:sessionsSession management, resume, history, session picker, and session state

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions