Skip to content

Fix WebUI thinking mode request handling#21657

Open
redyuan43 wants to merge 2 commits intoggml-org:masterfrom
redyuan43:codex/webui-thinking-sampling-presets
Open

Fix WebUI thinking mode request handling#21657
redyuan43 wants to merge 2 commits intoggml-org:masterfrom
redyuan43:codex/webui-thinking-sampling-presets

Conversation

@redyuan43
Copy link
Copy Markdown

Summary

  • decouple sampling presets from thinking mode in the WebUI
  • map WebUI Thinking mode to chat_template_kwargs.enable_thinking so on/off works for llama.cpp chat templates
  • disable reasoning parsing when Thinking mode is set to Off

Testing

  • npm run check
  • npm run build
  • rebuilt llama-server and verified a request with thinking disabled returns plain content without reasoning output

@redyuan43 redyuan43 requested a review from a team as a code owner April 9, 2026 04:55
@ggml-gh-bot
Copy link
Copy Markdown

ggml-gh-bot bot commented Apr 9, 2026

Hi @redyuan43, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • Large PR: Large changes require prior discussion (e.g. an issue or RFC) and maintainers may not be able to review this PR as-is. Consider splitting it into smaller, focused PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants