Skip to content

Regression: gpt-oss Jinja crash "Cannot pass both content and thinking" — fix from #19704 lost in #18675 #20500

@Rockbob89

Description

@Rockbob89

Name and Version:
Current master (983df14), regression introduced in 566059a
$llama-cli --version
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 126976 MiB):
Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 126976 MiB (49626 MiB free)
version: 8324 (983df14)
built with GNU 15.2.0 for Linux x86_64

Operating systems:
All? (template logic, not platform-specific)

GGML backends:
All? (not backend-specific) Im using HIP/ROCm 7.2

Hardware:
AMD Ryzen AI Max+ 395, Radeon 8060S (gfx1151), 128GB Unified Memory

Models:
gpt-oss-120b (any gpt-oss model using
openai-gpt-oss-120b.jinja)

Problem description:
PR #19704 (39e4b1d) fixed a Jinja template crash by adding
adjusted_message.erase("content") in
common_chat_params_init_gpt_oss(). This fix was lost when #18675
(566059a, "Autoparser refactoring") rewrote the function without
carrying over the erase call.

Additionally, the Anthropic Messages API path
(convert_anthropic_to_oai() in server-common.cpp) was never fixed
— it sets content = "" on assistant messages with
reasoning_content + tool_calls, triggering the same crash.

Reproducer:
Send a multi-turn /v1/messages request to llama-server
running a gpt-oss model, where assistant history contains
thinking + tool_use blocks.

First Bad Commit: 566059a (Autoparser - complete refactoring of
parser architecture #18675)

Relevant log output:
Cannot pass both content and thinking in an assistant message with
tool calls! Put the analysis message in one or the other, but not
both.

Both issues were patched locally and verified working with Claude Code CLI and OpenCode agentic workflows.

Edit: had a quick check for my local changes..

**common/chat.cpp line 943 — add adjusted_message.erase("content");:**
          
  if (has_reasoning_content && has_tool_calls) {
      auto adjusted_message        = msg;
      adjusted_message["thinking"] = msg.at("reasoning_content");
      adjusted_message.erase("content");  **// <- added**
      adjusted_messages.push_back(adjusted_message);
  [...]

**tools/server/server-common.cpp ~line 1553 — skip setting content when tool_calls + reasoning_content:**
  if (!converted_content.empty()) {
      new_msg["content"] = converted_content;
  } else if (has_tool_calls && !reasoning_content.empty()) {
      // Don't set content - gpt-oss rejects content +  thinking with tool_calls
  } else if (has_tool_calls || !reasoning_content.empty()) {
      new_msg["content"] = "";
 }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions