-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Eval bug: reasoning off gives reasoning medium for gpt-oss #20458
Copy link
Copy link
Open
Labels
Description
Name and Version
./llama-cli --version
load_backend: loaded RPC backend from /home/tb/code/llama-b8287/libggml-rpc.so
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon 780M Graphics (RADV PHOENIX) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
load_backend: loaded Vulkan backend from /home/tb/code/llama-b8287/libggml-vulkan.so
load_backend: loaded CPU backend from /home/tb/code/llama-b8287/libggml-cpu-zen4.so
version: 8287 (acb7c7906)
built with GNU 11.4.0 for Linux x86_64
Using --reasoning off for gpt-oss gives "Reasoning: medium", maybe it should give "Reasoning: low" ?
Using --reasoning-budget 0 for gpt-oss gives "Reasoning: medium", maybe it should give "Reasoning: low" ?
Using --chat-template-kwargs '{"reasoning_effort": "low"}' for gpt-oss gives "Reasoning: low" (ok).
./llama-server -hf unsloth/gpt-oss-20b-GGUF:F16 --threads -1 --parallel 1 --ctx-size 16384 --temp 1.0 --min-p 0.0 --top-p 1.0 --top-k 0 --n_predict 4096 --reasoning off --direct-io 2>&1 | grep -i reasoning
Reasoning: medium
maybe related to #20297 @pwilkin
Operating systems
Linux
GGML backends
Vulkan
Hardware
AMD Ryzen 7 PRO 7840U
Models
No response
Problem description & steps to reproduce
use "--reasoning off" with "unsloth/gpt-oss-20b-GGUF:F16"
First Bad Commit
No response
Relevant log output
Logs
Reactions are currently unavailable