Skip to content

fix(agentic): make truncated Write recovery actionable end-to-end#653

Merged
nonoqing merged 1 commit intoGCWing:mainfrom
nonoqing:yuyiqing/dev
May 11, 2026
Merged

fix(agentic): make truncated Write recovery actionable end-to-end#653
nonoqing merged 1 commit intoGCWing:mainfrom
nonoqing:yuyiqing/dev

Conversation

@nonoqing
Copy link
Copy Markdown
Collaborator

80cd279 introduced truncation recovery and injected an "use Edit to continue" hint into the assistant result. In practice the end-to-end loop never closed because:

  1. DeepResearchAgent did not include Edit in its allowed_tools, so the model's Edit attempts (correctly following the hint) were rejected at the allowed-list check and the model fell back to Read.
  2. The fallback Reads were identical (same file, same range) every round, producing a tight tool-call loop that nothing prevented.
  3. The hint itself used "verify the result" wording that nudged the model toward Read instead of Edit.

Closes all three gaps:

  • Add Edit to DeepResearchAgent::default_tools so the hint is actionable. Add a test assertion to lock in the invariant.
  • Tighten the recovery hint: explicitly forbid re-reading the just-written file, point the model at its own previous tool_use.input.content as the source for old_string, and give it a clean escape hatch ("tell the user the output was truncated") when no concrete continuation plan exists.
  • Add a per-session tool-call loop detector on ToolPipeline. When the same tool is called with deep-equal arguments TOOL_CALL_LOOP_THRESHOLD (=3) consecutive times in a session, the (THRESHOLD+1)-th call is rejected with an error naming the loop and instructing the model to switch tools or finish. Counts consecutive matches at the tail of a bounded 10-entry ring buffer, so a legitimate intervening call resets the streak. clear_session_tool_call_history is exposed so callers can drop the buffer when a session ends.

80cd279 introduced truncation recovery and injected an "use Edit to
continue" hint into the assistant result. In practice the end-to-end loop
never closed because:

1. DeepResearchAgent did not include Edit in its allowed_tools, so the
   model's Edit attempts (correctly following the hint) were rejected at
   the allowed-list check and the model fell back to Read.
2. The fallback Reads were identical (same file, same range) every round,
   producing a tight tool-call loop that nothing prevented.
3. The hint itself used "verify the result" wording that nudged the model
   toward Read instead of Edit.

Closes all three gaps:

- Add Edit to DeepResearchAgent::default_tools so the hint is actionable.
  Add a test assertion to lock in the invariant.
- Tighten the recovery hint: explicitly forbid re-reading the just-written
  file, point the model at its own previous tool_use.input.content as the
  source for old_string, and give it a clean escape hatch ("tell the user
  the output was truncated") when no concrete continuation plan exists.
- Add a per-session tool-call loop detector on ToolPipeline. When the same
  tool is called with deep-equal arguments TOOL_CALL_LOOP_THRESHOLD (=3)
  consecutive times in a session, the (THRESHOLD+1)-th call is rejected
  with an error naming the loop and instructing the model to switch tools
  or finish. Counts *consecutive* matches at the tail of a bounded
  10-entry ring buffer, so a legitimate intervening call resets the
  streak. clear_session_tool_call_history is exposed so callers can drop
  the buffer when a session ends.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@nonoqing nonoqing merged commit ffa8cc5 into GCWing:main May 11, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant