Skip to content

Subagent spawn fails near context limit due to truncated tool-call JSON #33

@giveen

Description

@giveen

Summary

When context cache is near full, Late can emit an oversized spawn_subagent tool call. The tool argument JSON appears to be truncated mid-string, and the backend (llama.cpp OpenAI-compatible server) returns HTTP 500 with a parse error. The subagent spawn fails instead of recovering gracefully.

Environment

  1. OS: Ubuntu 24.04 LTS
  2. Client: Late (planning flow with subagent spawning)
  3. Backend: llama.cpp / llama-swap router
  4. Endpoint: http://127.0.0.1/v1/chat/completions
  5. Context state at failure: near max (n_tokens = 262143, truncated = 1)

Steps to Reproduce

  1. Run a long planning session until context cache is nearly full.
  2. Trigger a spawn_subagent call with a large multi-paragraph goal payload.
  3. Let the model stream tool-call arguments.
  4. Observe backend response and Late behavior.

Expected Behavior

  1. Late should avoid sending oversized tool-call arguments when context is near limit.
  2. If tool-call JSON is truncated or invalid, Late should recover gracefully (retry with compact args or ask for a smaller payload).
  3. Error surfaced to user should be actionable.

Actual Behavior

  1. Backend returns HTTP 500 due to malformed tool-call arguments JSON.
  2. Subagent spawn fails.
  3. Delegation step does not complete.

Error Excerpt

LLAMA-CPP ERROR:

prompt eval time = 221.03 ms / 22 tokens
eval time = 88800.20 ms / 1535 tokens
total time = 89021.23 ms / 1557 tokens
slot release: n_tokens = 262143, truncated = 1
got exception: Failed to parse tool call arguments as JSON: parse error ... invalid string: missing closing quote
POST /v1/chat/completions ... 500

Impact

  1. Long-running sessions become unreliable right when delegation is needed most.
  2. High-context workflows can fail depending on tool-call argument size.
  3. Users lose time recovering from failed subagent spawns.

Suggested Fix Direction

  1. Add a preflight context-budget check before requests with tools.
  2. Enforce a maximum size for spawn_subagent goal payloads.
  3. Add automatic retry on malformed tool-call argument errors with a compact re-prompt.
  4. Surface backend error body in Late so the exact failure reason is visible.

Temporary Workarounds

  1. Start a fresh session before spawning subagents when context is large.
  2. Keep subagent goal text short and atomic.
  3. Put long implementation details in a file and reference it instead of embedding large payloads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions