Status (verified 2026-05-17)
Original 3 LLM-behavior issues from `docs/plans/2026-04-13-agent-loop-robustness-design.md`:
- Sequential enforcement — SHIPPED v0.9.1 (rejects multiple tool calls, teaches model, 5-retry cap)
- Response pagination + filter — SHIPPED v0.9.1 (page=0 guard, adaptive page size, MCP list-tool filter/summary)
- Empty/verbalized response continuation — SHIPPED v0.9.1 (intent-detection + max-2 retries)
Plugin is now at v0.9.3 (workflow v0.51.7 → v0.53.1 bump; authz v0.5.4).
Remaining gap
Per memory `project_self_improving_agentic_status.md` (2026-04-13):
- gemma4 loops on file_read (3x identical results) — loop-break heuristic could help.
- phi4-mini verbalizes tool calls in text without using the tool-calling protocol — intent detection fires but model doesn't comply.
- 16GB hardware is too constrained for the 2-model test matrix (load avg 22+).
These are model-quality + test-infra issues, not code issues. The plan's "Next Steps" called for:
- Re-execute on 32GB+ hardware with no competing load.
- Try qwen2.5:7b (already downloaded; may have better tool compliance).
- Try Anthropic Claude or OpenAI via API (cloud models have reliable tool calling).
- Add loop-breaking strategy: when file_read loops 3x, inject "You already read the file. Now modify it and call file_write".
- Consider simpler first-pass: fixed pipeline that reads → LLM modifies → validates → writes (skip agent tool-choice).
Recommendation
Close this item as code-complete. Move eval execution + model-behavior investigation to a separate test-infra epic. Loop-break heuristic is a small followup if/when the eval matrix surfaces a clear repro.
Why deferred from 2026-05-17 session
Filed during autonomous "continue cycle" mandate. Investigation confirmed the queued "3 remaining LLM behavior issues" are SHIPPED; what remains is hardware + model selection outside the autonomous-pipeline scope.
Status (verified 2026-05-17)
Original 3 LLM-behavior issues from `docs/plans/2026-04-13-agent-loop-robustness-design.md`:
Plugin is now at v0.9.3 (workflow v0.51.7 → v0.53.1 bump; authz v0.5.4).
Remaining gap
Per memory `project_self_improving_agentic_status.md` (2026-04-13):
These are model-quality + test-infra issues, not code issues. The plan's "Next Steps" called for:
Recommendation
Close this item as code-complete. Move eval execution + model-behavior investigation to a separate test-infra epic. Loop-break heuristic is a small followup if/when the eval matrix surfaces a clear repro.
Why deferred from 2026-05-17 session
Filed during autonomous "continue cycle" mandate. Investigation confirmed the queued "3 remaining LLM behavior issues" are SHIPPED; what remains is hardware + model selection outside the autonomous-pipeline scope.