Skip to content

fix: GLM 4.5 streaming tool-call parsing + grammar error handling#19612

Closed
Gunther-Schulz wants to merge 3 commits intoggml-org:masterfrom
Gunther-Schulz:fix/glm45-tool-parse-only-auto
Closed

fix: GLM 4.5 streaming tool-call parsing + grammar error handling#19612
Gunther-Schulz wants to merge 3 commits intoggml-org:masterfrom
Gunther-Schulz:fix/glm45-tool-parse-only-auto

Conversation

@Gunther-Schulz
Copy link
Copy Markdown

Disclosure: All code and this PR description were written by AI (Cursor).


Summary

This PR fixes GLM 4.5 (and related XML tool-call) streaming behaviour so that:

  1. The server no longer hangs when tool_choice=auto and the model never outputs the grammar trigger (parse-only path, no grammar for AUTO).
  2. The "Failed to parse up to error: ... attempting to parse an empty input" log spam is removed when streaming partial tool-call arguments.
  3. Plain-text arg_value content (e.g. subagent_type=explore) is parsed correctly instead of throwing partial forever, so Task/tool calls complete.
  4. Grammar/sampler runtime errors (e.g. "Unexpected empty grammar stack") return HTTP 500 and release the slot instead of aborting the server.

Changes

1. GLM 4.5 parse-only for AUTO (existing branch commit)

2. Avoid "parse empty input" log (json-partial + XML parser)

  • common/json-partial.cpp: When the SAX parser reports an error at position 0 (empty substring to re-parse), return false immediately without calling json::parse or logging "Failed to parse up to error". This avoids noisy logs for all formats (GLM 4.5, GPT-OSS, Granite, Hermes, etc.) when streaming partial or invalid JSON.
  • common/chat-parser-xml-toolcall.cpp: Only call try_consume_json() when the remainder after <arg_value> looks like the start of a JSON value (", {, [, digit, -, or prefix of true/false/null). Otherwise treat as incomplete/plain text and skip the JSON parser (vLLM-style), avoiding SAX error-at-position-0 for e.g. exp before explore.

3. Plain-text arg_value fallthrough + server grammar error handling

  • common/chat-parser-xml-toolcall.cpp: When the remainder does not look like JSON (e.g. explore), do not throw; skip try_consume_json() and fall through to the existing plain-text path (try_find_val_end()). Fixes Task tool (e.g. subagent_type=explore) never completing.
  • tools/server/server-context.cpp: Wrap common_sampler_accept() and common_sampler_sample_and_accept_n() in try/catch for std::runtime_error. On catch (e.g. "Unexpected empty grammar stack" from llama-grammar.cpp), log, send HTTP 500 with "Grammar constraint violation", release the slot, and continue so the server stays up.

Testing

  • test-json-partial and test-chat-parser pass.
  • Manual: GLM 4.5 with tool_choice=auto, streaming tool calls (e.g. Task with subagent_type=explore) complete without log spam; server does not abort on grammar errors.

Related

  • vLLM behaviour: GLM-4 parser only parses non-string arg values when </arg_value> is seen; we align with that.
  • Fixes / improves experience around streaming XML tool calls and grammar constraint violations in the server.

Gunther-Schulz and others added 3 commits February 14, 2026 00:42
…QUIRED

- In common_chat_params_init_glm_4_5: set grammar_lazy=false; build grammar
  only when has_tools && tool_choice==REQUIRED (vLLM-style: no trigger/grammar
  for AUTO, detect tool calls by parsing decoded text).
- Relax test-chat assert: allow empty grammar when test message has tool_calls
  (GLM 4.5 AUTO no longer sets grammar).

Fixes server hang when model never outputs trigger (e.g. llama.cpp ggml-org#19068).

Co-authored-by: Cursor <cursoragent@cursor.com>
- json-partial: when SAX error is at position 0, return false without
  calling json::parse or logging 'Failed to parse up to error' (covers
  all formats: GLM 4.5, GPT-OSS, Granite, Hermes, etc.)
- chat-parser-xml-toolcall: only call try_consume_json when remainder
  looks like start of JSON value (", {, [, digit, -, true/false/null);
  otherwise treat as plain text/partial (vLLM-style, avoids SAX
  error-at-position-0 for e.g. 'exp' before 'explore')

Co-authored-by: Cursor <cursoragent@cursor.com>
- chat-parser-xml-toolcall: when remainder does not look like JSON start
  (e.g. 'explore'), skip try_consume_json and fall through to plain-text
  path instead of throwing; fixes Task tool (subagent_type=explore) never
  completing
- server-context: catch std::runtime_error from common_sampler_accept and
  common_sampler_sample_and_accept_n (e.g. 'Unexpected empty grammar
  stack'); return 500 and release slot instead of aborting

Co-authored-by: Cursor <cursoragent@cursor.com>
@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented Mar 17, 2026

Obsoleted by #18675

@pwilkin pwilkin closed this Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples server testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants