From 12f3d1902102e9a340f204f38db199fdf384a8e2 Mon Sep 17 00:00:00 2001
From: mason5052 <ehehwnwjs5052@gmail.com>
Date: Wed, 3 Jun 2026 22:54:33 -0400
Subject: [PATCH] docs: add tool-call parser troubleshooting for custom LLM
 backends

Issue #313 reported flows that stall after a few steps when running a
custom OpenAI-compatible backend (LiteLLM in front of llama.cpp serving
qwen3.6-35b via LLM_SERVER_*). The backend returned malformed tool-call
arguments, surfaced as 'Failed to parse tool call arguments as JSON'
HTTP 500s and cascading retries. The maintainer fixed the stall in the
latest build by sanitizing wrong function-call arguments.

Add a troubleshooting subsection under Custom LLM Provider Configuration
that explains the root cause and how to diagnose it:

- Custom OpenAI-compatible backends must return valid tool-call
  (function-call) JSON; llama.cpp, SGLang, and vLLM usually require a
  specific tool-call parser and matching chat template, and not every
  setup produces valid tool calls out of the box.
- Symptoms: 'Failed to parse tool call arguments as JSON', flow stalls,
  looping tool calls, the 'failed to select primary docker image via
  llm call' start-of-flow failure, and unexpected backend HTTP errors.
- Investigation: check PentAGI and backend/proxy logs, validate with the
  ctester utility before a full flow, confirm the parser/chat template
  match the model, and update PentAGI (recent builds sanitize malformed
  function-call arguments).

Docs only. No tool-call parser code, provider runtime, schema, migration,
or config-default changes. Wording frames compatibility as dependent on
the backend's OpenAI-compatible tool-call behavior rather than claiming
every llama.cpp backend is supported.
---
 README.md | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/README.md b/README.md
index fc5ca5872..66cc5c4fe 100644
--- a/README.md
+++ b/README.md
@@ -1392,6 +1392,26 @@ The `LLM_SERVER_PRESERVE_REASONING` setting controls whether reasoning content i
 
 This setting is required by some LLM providers (e.g., Moonshot) that return errors like "thinking is enabled but reasoning_content is missing in assistant tool call message" when reasoning content is not included in multi-turn conversations. Enable this setting if your provider requires reasoning content to be preserved.
 
+#### Troubleshooting: tool-call (function-call) parser errors
+
+PentAGI drives its agents with tool calls (also called function calls), so any custom OpenAI-compatible backend configured through `LLM_SERVER_*` must return valid tool-call JSON in the format the OpenAI Chat Completions API defines. When the backend emits malformed, truncated, or non-conforming tool-call arguments, the agent chain cannot continue.
+
+Self-hosted engines such as llama.cpp, SGLang, and vLLM usually require a specific tool-call parser and a matching chat template to produce correct tool-call output. If the parser is missing or mismatched for the model you are serving, tool-call arguments can come back corrupted. Compatibility therefore depends on the backend's tool-call/function-call behavior and configuration, not on PentAGI alone; not every llama.cpp or SGLang setup produces valid tool calls out of the box.
+
+Typical symptoms:
+
+- Backend or proxy errors such as `Failed to parse tool call arguments as JSON` (often surfaced through a LiteLLM proxy as an HTTP 500), or other unexpected 5xx/4xx responses from the LLM endpoint.
+- A flow that runs for a few steps and then stops responding to new input in the UI.
+- Repeated or looping tool calls that never converge.
+- A flow that fails right at the start with `failed to select primary docker image via llm call`, because the first action in a flow is an LLM tool call to choose the container image; a backend that cannot return a valid tool call fails at this step too.
+
+How to investigate:
+
+1. Check both sides of the connection: the PentAGI logs (`docker compose logs -f pentagi`) and the inference backend or proxy logs (llama.cpp, SGLang, vLLM, or LiteLLM). The backend log usually shows the same parse error when it produced the malformed tool call.
+2. Validate the provider before running a full flow with the `ctester` utility, which exercises tool-calling agent types directly. See [Testing LLM Agents](https://github.com/vxcontrol/pentagi#testing-llm-agents).
+3. Confirm the backend's tool-call parser and chat template are the ones recommended for the model you are serving, and that the model itself supports tool calling.
+4. Update PentAGI to the latest build. Recent versions sanitize malformed function-call arguments returned by the model so a single bad response no longer stalls the whole flow; older builds forwarded the corrupted arguments and could get stuck.
+
 ### Ollama Provider Configuration
 
 PentAGI supports Ollama for both local LLM inference (zero-cost, enhanced privacy) and Ollama Cloud (managed service with free tier).