disabling reasoning does not work anymore on certain models

### Name and Version

```
llama-cli  --version
version: 8233 (c5a778891)
built with GNU 10.5.0 for Linux x86_64
```


### Operating systems

Linux

### GGML backends

CUDA

### Hardware

Nvidia Tesla V100 32GB

### Models

MiniMax-M2.5-IQ4_XS

### Problem description & steps to reproduce

Hello,

on a fresh build, I noticed that in llama-server `--reasoning-budget 0` or `--chat-template-kwargs {"enable_thinking": false}` do not have any effect anymore with MiniMax M2.5 (unsloth GGUF)
Model will spit thinking tokens as output.

AesSedai/Qwen3.5-35B-A3B-GGUF does not highlight this behavior for instance.

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


Now I see that thinking is enabled from llama-server output:
```console
llama-server  -m MiniMax-M2.5-IQ4_XS-00001-of-00004.gguf -ngl 99 -b 2048 -ub 2048 --jinja --reasoning-budget 0
[...]
init: chat template, example_format: ']~b]system
You are a helpful assistant[e~[
]~b]user
Hello[e~[
]~b]ai
Hi there[e~[
]~b]user
How are you?[e~[
]~b]ai
<think>
'
srv          init: init: chat template, thinking = 0
main: model loaded
main: server is listening on http://127.0.0.1:8081
main: starting the main loop...
srv  update_slots: all slots are idle
srv  log_server_r: done request: GET / 127.0.0.1 200
srv  params_from_: Chat format: peg-native
slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  3 | task 0 | processing task, is_child = 0
slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 180736, n_keep = 0, task.n_tokens = 39
slot update_slots: id  3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
slot init_sampler: id  3 | task 0 | init sampler, took 0.03 ms, tokens: text = 39, total = 39
slot update_slots: id  3 | task 0 | prompt processing done, n_tokens = 39, batch.n_tokens = 39
slot print_timing: id  3 | task 0 |
prompt eval time =     882.74 ms /    39 tokens (   22.63 ms per token,    44.18 tokens per second)
       eval time =     664.85 ms /    38 tokens (   17.50 ms per token,    57.16 tokens per second)
      total time =    1547.58 ms /    77 tokens
slot      release: id  3 | task 0 | stop processing: n_tokens = 76, truncated = 0
srv  update_slots: all slots are idle

srv          init: init: chat template, thinking = 0
```
</details>

<img width="854" height="411" alt="Image" src="https://github.com/user-attachments/assets/de3b42b8-0c3e-4a60-b556-dbf9ae8e8a11" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disabling reasoning does not work anymore on certain models #20196

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

disabling reasoning does not work anymore on certain models #20196

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions