common : rework gpt-oss parser by aldehir · Pull Request #20393 · ggml-org/llama.cpp

aldehir · 2026-03-11T11:33:25Z

Rework the gpt-oss parser.

Tighten up the grammar, gpt-oss is very good at following its own Harmony spec.
Allow any sequence of analysis/preamble.
Clean up the trigger rules, gpt-oss may sometimes invoke a builtin function if not constrained when emitting the tool namespace.
Include fix from common : fix gpt-oss Jinja error with content and thinking on tool-call messages #19704.
Fix response_format not being enforced.
Remove parallel call logic, gpt-oss is trained to only emit single tool calls.
Removed support for reasoning-format = none. It makes no sense for gpt-oss. Users can choose to ignore reasoning_content.
Remove invalid test cases.

fixes #20344
fixes #20500

pwilkin · 2026-03-11T13:18:47Z

common/chat.cpp

+
+        auto analysis = p.rule("analysis", p.literal("<|channel|>analysis<|message|>") + p.reasoning(content) + end);
+        auto preamble = p.rule("preamble", p.literal("<|channel|>commentary<|message|>") + p.content(content) + end);
+        auto final = p.rule("final", p.literal("<|channel|>final<|message|>") + p.content(content));


Final is a keyword, does this work correctly? I'd change it anyway just in case.

It does, which is why I kept it. But I'll change it. Most syntax highlighters are not semantic aware so it'll highlight it as a keyword.

pwilkin

I'll trust you on this one :)

pwilkin · 2026-03-11T14:17:54Z

Just two issues:
-> support for reasoning_format: none should stay, since some people want just raw template output to further process with their own custom parsers / apps (I think SillyTavern does stuff like that) and not outputting the reasoning in content breaks this
-> since you're saying this fixes structured output, better add a test for structured output :)

aldehir · 2026-03-12T07:54:39Z

I'll add the logic back in, but it truly makes no sense. For one, the official template will throw an exception if it sees any harmony tags in the message. Ultimately, this model is incapable of not reasoning. Even when constrained, it will leak reasoning traces inside the final response if it can.

That said, if clients depend on this then I guess 🤷‍♂️

pwilkin · 2026-03-12T10:56:19Z

@aldehir reasoning_format: none is not enable_thinking: false, it just means you don't process the reasoning tags and you spill them all in the content.

aldehir · 2026-03-12T11:03:43Z

@aldehir reasoning_format: none is not enable_thinking: false, it just means you don't process the reasoning tags and you spill them all in the content.

I'm fully aware of the difference. See point 2. The responsibility is pushed to the client to strip the tags. Currently there is a hack to remove the exception, but then the model will in-context learn and start to emit bad harmony output, thus breaking parsing.

I've been down this road when I implemented the original parsing.

Edit: I can see my phrasing conflated the two. Ignore the second part.

pwilkin · 2026-03-12T11:07:15Z

@aldehir I know this is generally non-feasible, but there exists a small but vocal group of people who use their own parsing tools for whatever reasons and they really like to get the raw unprocessed contents :)

But I just realized you can not change it and instead approve my #20289 to satisfy them :)

aldehir · 2026-03-12T11:11:49Z

But I just realized you can not change it and instead approve my #20289 to satisfy them :)

Yes, I rather just give them the whole harmony output. I've seen complaints about "missing tokens" too, it's never ending!

aldehir · 2026-03-18T06:05:34Z

@pwilkin added structured output test.

aldehir · 2026-03-19T02:55:37Z

common/chat.cpp

+    for (auto msg : inputs.messages) {
+        if (msg.contains("reasoning_content") && msg.at("reasoning_content").is_string()) {
+            msg["thinking"] = msg.at("reasoning_content");
+            msg.erase("content");


I messed up, need to only do this for tool calls.

Mo-Hashem · 2026-03-20T02:27:45Z

@aldehir I like your dedication in making gpt-oss work as intended for all since it launched .
I am also writing to let you know that @pwilkin is totally right about the reasoning-format = none
it must stay, I personally have my template that use reasoning-format = none
that enables agents that doesn't support interleaved CoT (which is 90% of all Agents)

and llama.cpp philosophy is power and high customization, now what is the deference between it and Ollama/LM-studio

aldehir · 2026-03-20T04:25:34Z

@Mo-Hashem, unfortunately it will cause double tags to render and contradicts the template. The official template normally throws an exception. Nonetheless, I can add it back in.

Mo-Hashem · 2026-03-20T05:48:29Z

Yes add it back please, it is a serious blocker
I fixed the double tags with simple template customization (will release it to community after an edge case fix)
templates can be customized, but this no way
thanks @aldehir

* common : rework gpt-oss parser * cont : fix gpt-oss tests * cont : add structured output test * cont : rename final to final_msg

aldehir requested a review from pwilkin as a code owner March 11, 2026 11:33

aldehir requested review from pwilkin and removed request for pwilkin March 11, 2026 11:33

aldehir mentioned this pull request Mar 11, 2026

Misc. bug: Generating structured output according to a given JSON schema fails with a 500 server error using gpt-oss-120b #20344

Closed

github-actions bot added the testing Everything test related label Mar 11, 2026

pwilkin reviewed Mar 11, 2026

View reviewed changes

pwilkin approved these changes Mar 11, 2026

View reviewed changes

This was referenced Mar 13, 2026

Regression: gpt-oss Jinja crash "Cannot pass both content and thinking" — fix from #19704 lost in #18675 #20500

Closed

Misc. bug: Server 500 error (gpt-oss, AutoParser, cache) #20532

Closed

aldehir added 3 commits March 18, 2026 00:45

common : rework gpt-oss parser

c28fcf1

cont : fix gpt-oss tests

0c74b4b

cont : add structured output test

34f75a1

aldehir force-pushed the rework-gpt-oss branch from ed3f03e to 34f75a1 Compare March 18, 2026 06:02

aldehir requested a review from a team as a code owner March 18, 2026 06:02

cont : rename final to final_msg

cdaed70

pwilkin approved these changes Mar 18, 2026

View reviewed changes

pwilkin merged commit 5e8910a into ggml-org:master Mar 18, 2026
47 of 48 checks passed

pwilkin deleted the rework-gpt-oss branch March 18, 2026 09:41

aldehir commented Mar 19, 2026

View reviewed changes

This was referenced Mar 19, 2026

common : fix gpt-oss content removal #20745

Merged

chat: new parser should not crash inference #20708

Closed

Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026

common : rework gpt-oss parser (ggml-org#20393)

d0cec90

* common : rework gpt-oss parser * cont : fix gpt-oss tests * cont : add structured output test * cont : rename final to final_msg

aldehir mentioned this pull request Mar 28, 2026

common : add reasoning_format = none support to gpt-oss #21094

Merged

Conversation

aldehir commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

aldehir Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

pwilkin left a comment

Choose a reason for hiding this comment

Uh oh!

pwilkin commented Mar 11, 2026

Uh oh!

aldehir commented Mar 12, 2026

Uh oh!

pwilkin commented Mar 12, 2026

Uh oh!

aldehir commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Mar 12, 2026

Uh oh!

aldehir commented Mar 12, 2026

Uh oh!

aldehir commented Mar 18, 2026

Uh oh!

Uh oh!

aldehir Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Mo-Hashem commented Mar 20, 2026

Uh oh!

aldehir commented Mar 20, 2026

Uh oh!

Mo-Hashem commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aldehir commented Mar 11, 2026 •

edited

Loading

aldehir commented Mar 12, 2026 •

edited

Loading