UPSTREAM PR #21240: Relax prefill parser to allow space. by loci-dev · Pull Request #1324 · auroralabs-loci/llama.cpp

loci-dev · 2026-04-01T03:10:39Z

Note

Source pull request: ggml-org/llama.cpp#21240

Overview

As in title.

Additional information

Prefill parser was strictly requiring the reasoning marker at the very start of the message, which interfered with models that liked to insert eg. a newline there.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

loci-review · 2026-04-01T04:41:24Z

Overview

Impact: Minor - No performance concerns identified.

Function Analysis: 24 modified functions (0.02% of 124,016 total). Changes isolated to chat template parser enhancement and compiler optimizations in auxiliary tools.

Binaries Analyzed (15 total):

Binary	Power Change
build.bin.llama-tts	-0.038%
build.bin.llama-cvector-generator	-0.046%
build.bin.libllama.so	-0.0001%
build.bin.llama-bench	+0.0003%
build.bin.libmtmd.so	0.0%
build.bin.libggml-cpu.so	0.0%
build.bin.libggml-base.so	0.0%
build.bin.libggml.so	0.0%
build.bin.llama-tokenize	0.0%
build.bin.llama-gemma3-cli	0.0%
build.bin.llama-gguf-split	0.0%
build.bin.llama-llava-cli	0.0%
build.bin.llama-minicpmv-cli	0.0%
build.bin.llama-quantize	0.0%
build.bin.llama-qwen2vl-cli	0.0%

Total system power consumption: -0.018% (negligible).

Function Analysis

common_chat_peg_builder::prefix() (llama-tts, llama-cvector-generator):

Response time: +302% (7.7μs → 31.1μs, +23.4μs)
Throughput time: +5.2% (110ns → 115ns, +5.7ns)
Justification: Intentional change adds + space() to enable flexible whitespace handling in chat templates. The 23.4μs increase occurs during one-time parser initialization, not inference. Functional improvement outweighs negligible performance impact.

Compiler optimizations (6 functions): std::vector::end (-69% response time), std::chrono::operator- (-40% throughput time), httplib::detail::parse_http_date (-44% throughput time), std::vector::_M_move_assign (-25% throughput time), std::pair constructor (-4% response time), httplib::detail::websocket_accept_key (-31% throughput time). All show improved code generation with no source changes.

Minor compiler artifacts (5 functions): nlohmann::basic_json::create (+50% throughput time, +132ns), httplib::Client::Get/Patch/Put (+18-23% throughput time, +25-26ns). Absolute impacts negligible; functions are I/O-bound or infrequent.

Other analyzed functions saw negligible changes.

Flame Graph Comparison

Function: common_chat_peg_builder::prefix() (build.bin.llama-tts)

Base version:

Target version:

The target version introduces new call chains for space() (7.7μs) and operator+() (15.7μs) that create deeper execution paths with sequence building and vector operations, explaining the response time increase. The change is intentional to support flexible whitespace matching in chat templates.

Additional Findings

No inference impact: Zero changes to llama_decode(), matrix operations, attention mechanisms, KV cache, quantization, or GPU kernels. Core inference libraries (libllama.so, libggml-*.so) show zero power consumption change, confirming changes are isolated to non-critical paths.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

Relax prefill parser to allow space.

8621121

loci-dev temporarily deployed to PROD__AL_DEMO April 1, 2026 03:10 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 9 times, most recently from 126cd1f to a8215be Compare April 8, 2026 02:18

loci-dev force-pushed the main branch 7 times, most recently from e800934 to a024d9c Compare April 15, 2026 02:19

loci-dev force-pushed the main branch 6 times, most recently from 7638ab4 to f1b46d5 Compare April 20, 2026 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #21240: Relax prefill parser to allow space.#1324

UPSTREAM PR #21240: Relax prefill parser to allow space.#1324
loci-dev wants to merge 1 commit intomainfrom
loci/pr-21240-relax-prefill-parser

loci-dev commented Apr 1, 2026

Uh oh!

loci-review bot commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Apr 1, 2026

Overview

Additional information

Requirements

Uh oh!

loci-review bot commented Apr 1, 2026

Overview

Function Analysis

Flame Graph Comparison

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants