UPSTREAM PR #21139: grammar: make MAX_REPETITION_THRESHOLD configurable via env var#1315
UPSTREAM PR #21139: grammar: make MAX_REPETITION_THRESHOLD configurable via env var#1315
Conversation
The hardcoded threshold of 2000 causes grammar parsing to fail for legitimate tool-calling schemas with many optional parameters. Add LLAMA_GRAMMAR_MAX_REPETITIONS env var to override the default. When unset, behaviour is unchanged (default 2000). May fix grammar failures reported in openclaw/openclaw#32916, openclaw/openclaw#38569, openclaw/openclaw#38899.
OverviewAnalysis of 123,911 functions across 15 binaries reveals minimal performance impact from a single commit adding grammar configuration. Only 49 functions (0.04%) modified, all C++ STL template instantiations with no source code changes. Function counts: 123,911 total | 49 modified | 2 new | 0 removed | 123,860 unchanged Power consumption changes:
Function AnalysisTop regressions (compiler optimization artifacts, not source changes):
Top improvements:
Other analyzed functions showed similar compiler-driven variations in STL template instantiations with negligible real-world impact. Additional FindingsAll changes are compiler code generation artifacts from template instantiation differences—no source code modifications to these functions. The grammar configuration change (commit 90fe2f9) is isolated to 🔎 Full breakdown: Loci Inspector |
fd3ce9d to
1770118
Compare
385b1fc to
06d9e10
Compare
7638ab4 to
f1b46d5
Compare
Note
Source pull request: ggml-org/llama.cpp#21139
The hardcoded threshold of 2000 causes grammar parsing to fail for legitimate tool-calling schemas with many optional parameters.
Add LLAMA_GRAMMAR_MAX_REPETITIONS env var to override the default. When unset, behaviour is unchanged (default 2000).
May fix grammar failures reported in openclaw/openclaw#32916, openclaw/openclaw#38569, openclaw/openclaw#38899.
AI was used in an assistive capacity to identify the threshold constant and review the approach. The code change and testing were done manually.
Overview
Adds a helper function that reads LLAMA_GRAMMAR_MAX_REPETITIONS from the environment, replacing the hardcoded MAX_REPETITION_THRESHOLD macro. Three call sites updated. No API changes, fully backwards compatible.
Additional information
Tested on with Qwen3.5-122B-A10B and 21 OpenClaw tools (message tool: 109 optional params, browser: 48). Default threshold (2000): grammar fails on every request. Threshold at 20000: grammar parses successfully.
Requirements