fix(reasoning): don't persist request-scoped reasoning_effort as an operator disable (#10622)#10623
Open
Anai-Guo wants to merge 1 commit into
Open
fix(reasoning): don't persist request-scoped reasoning_effort as an operator disable (#10622)#10623Anai-Guo wants to merge 1 commit into
Anai-Guo wants to merge 1 commit into
Conversation
…del config When a model sets `reasoning_effort: none` (or any default) in its YAML without an explicit `reasoning.disable`, ApplyReasoningEffort resolves that default at request time and sets ReasoningConfig.DisableReasoning on the request-scoped config copy. The post-load thinking/marker probe then wrote that request-scoped value back into the loader's persistent config via UpdateModelConfig, making it look as though the operator had explicitly set reasoning.disable=true. From then on, per-request `reasoning_effort` overrides were silently ignored (an explicit operator disable wins over a request asking to think). DetectThinkingSupportFromBackend only fills reasoning slots that are still nil, so a slot already set here came from ApplyReasoningEffort, not the probe. Snapshot which slots were nil before the probe and only persist those, so the probe's genuine backend detection is still saved while request-time reasoning effort never leaks into the persistent config. Fixes mudler#10622 Signed-off-by: Tai An <antai12232931@outlook.com>
106efda to
30fc379
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Fixes #10622.
A model that sets
reasoning_effort: none(or any effort default) in its YAML without an explicitreasoning.disableloses the ability to enable thinking on a per-request basis after the first request.Root cause
ApplyReasoningEffortresolves the effective effort (config defaultnonewhen the request omits it) and setsReasoningConfig.DisableReasoning = trueon the request-scoped config copy (core/config/model_config.go).UpdateModelConfig, which copiedc.ReasoningConfig.DisableReasoningback into the loader's persistent config (core/backend/llm.go).DetectThinkingSupportFromBackendonly fills reasoning slots that are stillnil, so the probe never actually produced that value — it was the request-timenonedefault. But it is now persisted as if the operator had explicitly setreasoning.disable: true.ApplyReasoningEffortsees the (now non-nil) persisted disable and treats it as an operator's explicit disable, so a request-levelreasoning_effortcan no longer re-enable thinking.Fix
Snapshot which reasoning slots were still
nilbefore the probe. Only persist a slot if the probe was actually allowed to fill it (i.e. it wasnil). This keeps the probe's genuine backend detection (and the media marker) persisted, while request-timereasoning_effortvalues never leak into the persistent config.Result:
reasoning_effort: noneremains the per-request default, but clients can still request extra thinking via thereasoning_effortrequest param — exactly the expected behavior from the issue. An operator's explicitreasoning.disableis unaffected (it starts non-nil, so it is preserved and still wins).Notes
Minimal, non-behavioral for the explicit-disable and no-effort paths. No new probe calls; the gRPC detection still runs exactly once, outside the loader lock, as before.
🤖 Generated with Claude Code