fix: GLM 4.5 streaming tool-call parsing + grammar error handling#19612
Closed
Gunther-Schulz wants to merge 3 commits intoggml-org:masterfrom
Closed
fix: GLM 4.5 streaming tool-call parsing + grammar error handling#19612Gunther-Schulz wants to merge 3 commits intoggml-org:masterfrom
Gunther-Schulz wants to merge 3 commits intoggml-org:masterfrom
Conversation
…QUIRED - In common_chat_params_init_glm_4_5: set grammar_lazy=false; build grammar only when has_tools && tool_choice==REQUIRED (vLLM-style: no trigger/grammar for AUTO, detect tool calls by parsing decoded text). - Relax test-chat assert: allow empty grammar when test message has tool_calls (GLM 4.5 AUTO no longer sets grammar). Fixes server hang when model never outputs trigger (e.g. llama.cpp ggml-org#19068). Co-authored-by: Cursor <cursoragent@cursor.com>
- json-partial: when SAX error is at position 0, return false without
calling json::parse or logging 'Failed to parse up to error' (covers
all formats: GLM 4.5, GPT-OSS, Granite, Hermes, etc.)
- chat-parser-xml-toolcall: only call try_consume_json when remainder
looks like start of JSON value (", {, [, digit, -, true/false/null);
otherwise treat as plain text/partial (vLLM-style, avoids SAX
error-at-position-0 for e.g. 'exp' before 'explore')
Co-authored-by: Cursor <cursoragent@cursor.com>
- chat-parser-xml-toolcall: when remainder does not look like JSON start (e.g. 'explore'), skip try_consume_json and fall through to plain-text path instead of throwing; fixes Task tool (subagent_type=explore) never completing - server-context: catch std::runtime_error from common_sampler_accept and common_sampler_sample_and_accept_n (e.g. 'Unexpected empty grammar stack'); return 500 and release slot instead of aborting Co-authored-by: Cursor <cursoragent@cursor.com>
Member
|
Obsoleted by #18675 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Disclosure: All code and this PR description were written by AI (Cursor).
Summary
This PR fixes GLM 4.5 (and related XML tool-call) streaming behaviour so that:
tool_choice=autoand the model never outputs the grammar trigger (parse-only path, no grammar for AUTO).arg_valuecontent (e.g.subagent_type=explore) is parsed correctly instead of throwing partial forever, so Task/tool calls complete.Changes
1. GLM 4.5 parse-only for AUTO (existing branch commit)
tool_choice == required; forauto, do not set grammar/trigger so tool calls are detected by parsing decoded text (vLLM-style).2. Avoid "parse empty input" log (json-partial + XML parser)
falseimmediately without callingjson::parseor logging "Failed to parse up to error". This avoids noisy logs for all formats (GLM 4.5, GPT-OSS, Granite, Hermes, etc.) when streaming partial or invalid JSON.try_consume_json()when the remainder after<arg_value>looks like the start of a JSON value (",{,[, digit,-, or prefix oftrue/false/null). Otherwise treat as incomplete/plain text and skip the JSON parser (vLLM-style), avoiding SAX error-at-position-0 for e.g.expbeforeexplore.3. Plain-text arg_value fallthrough + server grammar error handling
explore), do not throw; skiptry_consume_json()and fall through to the existing plain-text path (try_find_val_end()). Fixes Task tool (e.g.subagent_type=explore) never completing.common_sampler_accept()andcommon_sampler_sample_and_accept_n()in try/catch forstd::runtime_error. On catch (e.g. "Unexpected empty grammar stack" from llama-grammar.cpp), log, send HTTP 500 with "Grammar constraint violation", release the slot, and continue so the server stays up.Testing
test-json-partialandtest-chat-parserpass.tool_choice=auto, streaming tool calls (e.g. Task withsubagent_type=explore) complete without log spam; server does not abort on grammar errors.Related
</arg_value>is seen; we align with that.