Improve tool calling, Ollama resilience, and response handling #45

MichaelAnders · 2026-02-10T20:15:02Z

Summary

Fix stripThinkingBlocks() destroying markdown bullet points in Ollama responses
Handle Ollama offline gracefully with retry and 503 response
Add SUGGESTION_MODE_MODEL env var to control suggestion mode LLM calls
Remove tool_result_direct short-circuit and improve agentic loop
Add external file read with tilde expansion and user approval flow
Improve tool calling response handling for Ollama models

Problem

Several issues with the agentic loop and Ollama integration:

stripThinkingBlocks() was destroying valid response content - The regex matched standard markdown bullets as "thinking markers", causing most list-based responses to be truncated after the first heading.
No graceful handling when Ollama is offline - Requests would fail hard with no retry or user-friendly error.
Suggestion mode wasted GPU resources - Every user message triggered a full agentic loop with tools for suggestion prediction, blocking responses on large models (70b+).
Tool calling response handling had edge cases with Ollama's response format.

Changes

Replaced stripThinkingBlocks() heuristic with explicit <think> tag stripping
Added Ollama offline detection with retry logic and 503 responses
Added SUGGESTION_MODE_MODEL env var (none to disable, or redirect to smaller model)
Removed tool_result_direct short-circuit that could skip tool execution
Added external file read support with tilde expansion and user approval
Improved Ollama tool call conversion and response normalization

Testing

Verified markdown bullet points survive in Ollama responses (the original bug)
Tested Ollama offline/online transitions
Tested suggestion mode disable (SUGGESTION_MODE_MODEL=none)
Existing unit tests pass

- Add fallback parsing for Ollama models that return tool calls as JSON text in message content instead of using the structured tool_calls field - Return tool results directly to CLI instead of making a follow-up LLM call, reducing latency and preventing hallucinated rewrites of output - Add dedicated Glob tool returning plain text (one path per line) instead of JSON, with workspace_list accepting both 'pattern' and 'patterns' - Clarify why Glob is not aliased to workspace_list (format mismatch)

- Additional logging for tool call parsing and execution - Hard-coded shell commands for reliable tool execution - Deduplication of tool calls within a single response - Collect and return results from all called tools - Ollama uses specified Ollama model - Fix double-serialized JSON parameters from some providers

Enable fs_read to handle paths outside the workspace (~/... and absolute paths) via a two-phase approval flow: the tool first returns a 403 asking the LLM to get user confirmation, then reads the file on a second call with user_approved=true. Write/edit remain workspace-only.

Tool results now loop back to the model for natural language synthesis instead of being returned raw to the CLI. This fixes the bug where conversational messages (e.g. "hi") triggered tool calls and dumped raw output. Additional improvements: - Context-aware tiered compression that scales with model context window - Empty response detection with retry-then-fallback - _noToolInjection flag to prevent provider-level tool re-injection - Auto-approve external file reads in tool executor - Conversation context search in workspace_search

Three concurrent runAgentLoop calls per user message waste GPU time with large models (~30s each). This adds SUGGESTION_MODE_MODEL config to skip ("none") or redirect suggestion mode to a lighter model. Also adds ISO timestamps and mode tags to debug logs for easier debugging.

- Add ECONNREFUSED to retryable errors and check undici TypeError .cause.code so connection-refused errors get retried with backoff - Wrap invokeModel in try/catch returning structured 503 with provider_unreachable error instead of raw TypeError bubbling to Express error middleware - Fix suggestion mode early return response shape (json -> body) to match router expectations

… responses The heuristic-based stripThinkingBlocks() matched standard markdown bullets (- item, * item) as "thinking block markers" and dropped all subsequent content. Replace with stripThinkTags() that only strips <think>...</think> tags used by models like DeepSeek and Qwen for chain-of-thought reasoning.

MichaelAnders · 2026-02-10T20:18:22Z

Closing - commits already covered by individual PRs #39-#42. Will create separate PR for remaining changes.

MichaelAnders added 7 commits February 5, 2026 21:23

MichaelAnders closed this Feb 10, 2026

MichaelAnders deleted the feature/improve-tool-calling-v2 branch February 10, 2026 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve tool calling, Ollama resilience, and response handling #45

Improve tool calling, Ollama resilience, and response handling #45

Uh oh!

MichaelAnders commented Feb 10, 2026

Uh oh!

MichaelAnders commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant