common : rework gpt-oss parser#20393
Conversation
common/chat.cpp
Outdated
|
|
||
| auto analysis = p.rule("analysis", p.literal("<|channel|>analysis<|message|>") + p.reasoning(content) + end); | ||
| auto preamble = p.rule("preamble", p.literal("<|channel|>commentary<|message|>") + p.content(content) + end); | ||
| auto final = p.rule("final", p.literal("<|channel|>final<|message|>") + p.content(content)); |
There was a problem hiding this comment.
Final is a keyword, does this work correctly? I'd change it anyway just in case.
There was a problem hiding this comment.
It does, which is why I kept it. But I'll change it. Most syntax highlighters are not semantic aware so it'll highlight it as a keyword.
pwilkin
left a comment
There was a problem hiding this comment.
I'll trust you on this one :)
|
Just two issues: |
|
I'll add the logic back in, but it truly makes no sense. For one, the official template will throw an exception if it sees any harmony tags in the message. Ultimately, this model is incapable of not reasoning. Even when constrained, it will leak reasoning traces inside the final response if it can. That said, if clients depend on this then I guess 🤷♂️ |
|
@aldehir |
I'm fully aware of the difference. See point 2. The responsibility is pushed to the client to strip the tags. Currently there is a hack to remove the exception, but then the model will in-context learn and start to emit bad harmony output, thus breaking parsing. I've been down this road when I implemented the original parsing. Edit: I can see my phrasing conflated the two. Ignore the second part. |
|
@aldehir I know this is generally non-feasible, but there exists a small but vocal group of people who use their own parsing tools for whatever reasons and they really like to get the raw unprocessed contents :) But I just realized you can not change it and instead approve my #20289 to satisfy them :) |
Yes, I rather just give them the whole harmony output. I've seen complaints about "missing tokens" too, it's never ending! |
|
@pwilkin added structured output test. |
| for (auto msg : inputs.messages) { | ||
| if (msg.contains("reasoning_content") && msg.at("reasoning_content").is_string()) { | ||
| msg["thinking"] = msg.at("reasoning_content"); | ||
| msg.erase("content"); |
There was a problem hiding this comment.
I messed up, need to only do this for tool calls.
|
@aldehir I like your dedication in making gpt-oss work as intended for all since it launched . and llama.cpp philosophy is power and high customization, now what is the deference between it and Ollama/LM-studio |
|
@Mo-Hashem, unfortunately it will cause double tags to render and contradicts the template. The official template normally throws an exception. Nonetheless, I can add it back in. |
|
Yes add it back please, it is a serious blocker |
* common : rework gpt-oss parser * cont : fix gpt-oss tests * cont : add structured output test * cont : rename final to final_msg
Rework the gpt-oss parser.
response_formatnot being enforced.reasoning-format = none. It makes no sense for gpt-oss. Users can choose to ignorereasoning_content.fixes #20344
fixes #20500