Name and Version:
Current master (983df14), regression introduced in 566059a
$llama-cli --version
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 126976 MiB):
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 126976 MiB (49626 MiB free)
version: 8324 (983df14)
built with GNU 15.2.0 for Linux x86_64
Operating systems:
All? (template logic, not platform-specific)
GGML backends:
All? (not backend-specific) Im using HIP/ROCm 7.2
Hardware:
AMD Ryzen AI Max+ 395, Radeon 8060S (gfx1151), 128GB Unified Memory
Models:
gpt-oss-120b (any gpt-oss model using
openai-gpt-oss-120b.jinja)
Problem description:
PR #19704 (39e4b1d) fixed a Jinja template crash by adding
adjusted_message.erase("content") in
common_chat_params_init_gpt_oss(). This fix was lost when #18675
(566059a, "Autoparser refactoring") rewrote the function without
carrying over the erase call.
Additionally, the Anthropic Messages API path
(convert_anthropic_to_oai() in server-common.cpp) was never fixed
— it sets content = "" on assistant messages with
reasoning_content + tool_calls, triggering the same crash.
Reproducer:
Send a multi-turn /v1/messages request to llama-server
running a gpt-oss model, where assistant history contains
thinking + tool_use blocks.
First Bad Commit: 566059a (Autoparser - complete refactoring of
parser architecture #18675)
Relevant log output:
Cannot pass both content and thinking in an assistant message with
tool calls! Put the analysis message in one or the other, but not
both.
Both issues were patched locally and verified working with Claude Code CLI and OpenCode agentic workflows.
Edit: had a quick check for my local changes..
**common/chat.cpp line 943 — add adjusted_message.erase("content");:**
if (has_reasoning_content && has_tool_calls) {
auto adjusted_message = msg;
adjusted_message["thinking"] = msg.at("reasoning_content");
adjusted_message.erase("content"); **// <- added**
adjusted_messages.push_back(adjusted_message);
[...]
**tools/server/server-common.cpp ~line 1553 — skip setting content when tool_calls + reasoning_content:**
if (!converted_content.empty()) {
new_msg["content"] = converted_content;
} else if (has_tool_calls && !reasoning_content.empty()) {
// Don't set content - gpt-oss rejects content + thinking with tool_calls
} else if (has_tool_calls || !reasoning_content.empty()) {
new_msg["content"] = "";
}
Name and Version:
Current master (983df14), regression introduced in 566059a
$llama-cli --version
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 126976 MiB):
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 126976 MiB (49626 MiB free)
version: 8324 (983df14)
built with GNU 15.2.0 for Linux x86_64
Operating systems:
All? (template logic, not platform-specific)
GGML backends:
All? (not backend-specific) Im using HIP/ROCm 7.2
Hardware:
AMD Ryzen AI Max+ 395, Radeon 8060S (gfx1151), 128GB Unified Memory
Models:
gpt-oss-120b (any gpt-oss model using
openai-gpt-oss-120b.jinja)
Problem description:
PR #19704 (39e4b1d) fixed a Jinja template crash by adding
adjusted_message.erase("content") in
common_chat_params_init_gpt_oss(). This fix was lost when #18675
(566059a, "Autoparser refactoring") rewrote the function without
carrying over the erase call.
Additionally, the Anthropic Messages API path
(convert_anthropic_to_oai() in server-common.cpp) was never fixed
— it sets content = "" on assistant messages with
reasoning_content + tool_calls, triggering the same crash.
Reproducer:
Send a multi-turn /v1/messages request to llama-server
running a gpt-oss model, where assistant history contains
thinking + tool_use blocks.
First Bad Commit: 566059a (Autoparser - complete refactoring of
parser architecture #18675)
Relevant log output:
Cannot pass both content and thinking in an assistant message with
tool calls! Put the analysis message in one or the other, but not
both.
Both issues were patched locally and verified working with Claude Code CLI and OpenCode agentic workflows.
Edit: had a quick check for my local changes..