Skip to content

plan agent stops without tools on altimate-default (prompt-only plans + content-policy refusals) #887

@sahrizvi

Description

@sahrizvi

Symptom

In plan mode, the planner subagent on altimate-backend/altimate-default (the gateway alias, currently routing to a GPT-5.x class model) sometimes ends its first step without ever calling a tool. Two observed sub-modes:

  1. Prompt-context-only plan: the model writes a plausible-sounding plan from the prompt alone without reading any files. Frequently references files/symbols that may not exist or misses existing patterns.
  2. Outright refusal: the model returns "I'm sorry, but I cannot assist with that request." and stops, even for benign requests (one repro: "plan a feature which allows me to turn on YOLO mode from within the interface via a button" — content-policy refusal driven by the phrase "YOLO mode").

In both cases the in-product safety net (processor.ts:351-389) fires the plan_no_tool_generation warning. Until now its copy told users to "switch to a model with stronger tool-use," which is misleading — the model is tool-capable; the failure modes are (a) prompt engineering not strong enough to force exploration, (b) gateway content policy declining the request.

Historical context

Jira AI-5987 [internal] tracked an earlier observation of the same failure class (61.2% failure rate / 30 of 49 plan-mode sessions over 96h, 2026-03-23) but was cancelled without a remediation in altimate-code. This issue re-opens the surface from a UX angle.

Final warning copy:

⚠️ altimate-code: the plan agent on <provider>/<model> stopped without calling any tools — it neither read, searched, nor explored the codebase. Common causes: (a) the model wrote a plan from prompt context alone, (b) the model declined to engage with the request (content-policy refusal), or (c) the request was too thin to act on. To recover, try one of: reply asking it to investigate first (read/grep/glob/explore); rephrase the request more concretely; or /model to a tier that's more eager to explore (e.g. Claude Sonnet/Opus).

Trip-wire condition and plan_no_tool_generation telemetry event shape unchanged.

Open questions / follow-ups (out of scope for this PR)

  • Should we differentiate refusals from prompt-context-plans in telemetry? Would need to look at the most recent text-end content from the step. Considered and rejected for the first cut (pattern-matching is locale-specific and brittle), but worth revisiting if telemetry shows refusals are common enough to call out separately in the UI.
  • Should altimate-default route to a Claude tier for plan-agent steps specifically? The gateway has the model identity; plan-agent steps could pin to a tool-eager model independent of the user's normal /model choice. Bigger change, separate proposal.
  • Long term: is the AI-5987 "61% failure" observation still accurate post-fix? Worth re-querying telemetry after this fix lands and a release cycle passes.

Verification

  • 160 affected tests pass; typecheck clean.
  • Preview binary built locally from fix/plan-agent-tool-use (version string 0.0.0-fix/plan-agent-tool-use-202606041209). Manual repro of the YOLO refusal scenario confirms the new warning copy is shipped and accurately describes the observed behavior.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions