feat(llm): bump seed-gen/patcher to newest models + fix GPT-5.x temperature#553
feat(llm): bump seed-gen/patcher to newest models + fix GPT-5.x temperature#553ret2libc wants to merge 4 commits into
Conversation
Add 9 model entries across the LiteLLM proxy config, k8s values, and ButtercupLLM enum so callers can opt into newer models without code changes: openai-o4-mini, openai-gpt-5/-mini/-nano, claude-4-opus, claude-4.1-opus, claude-4.5-haiku, claude-4.6-sonnet, claude-4.7-opus. Verified end-to-end with docker compose: LiteLLM v1.57.8 boots cleanly, /v1/models lists all 27 entries, and chat completions for the new model IDs are dispatched to api.anthropic.com / api.openai.com. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…5 + gpt-5.4/5.4-mini/5.5 Replace the previously added openai-o4-mini and gpt-5/-mini/-nano entries with gpt-5.4, gpt-5.4-mini, gpt-5.5; drop claude opus 4 and 4.1 in favor of opus 4.6. Final new entries (7): claude-4.5-haiku, claude-4.6-sonnet, claude-4.6-opus, claude-4.7-opus, openai-gpt-5.4-mini, openai-gpt-5.4, openai-gpt-5.5. Re-verified with docker compose: LiteLLM v1.57.8 boots, /v1/models lists all 25 entries, and chat completions for claude-4.6-opus and openai-gpt-5.5 are dispatched to api.anthropic.com / api.openai.com. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GPT-5.x reasoning models (gpt-5.4, gpt-5.4-mini, gpt-5.5) reject any
non-default temperature with HTTP 400 ("Unsupported value: 'temperature'
... Only the default (1) value is supported"). The codebase sends
temperature=0.1/0.3 via the common LLM helpers, so every component routed
to these models fails.
A global `litellm_settings: drop_params: true` is not sufficient: the
pinned litellm (1.57.8) does not recognize these new model names, so it
does not know temperature is unsupported and forwards it anyway. Add an
explicit, version-independent `additional_drop_params: ["temperature"]`
to each gpt-5.x model entry, plus the global drop_params for any other
reasoning models litellm does recognize.
Verified via the Docker e2e (example-libpng): patcher on openai-gpt-5.5
runs with 0 temperature-400 errors after this change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tier-preserving upgrade onto the models added in #544: - seed-gen: primary claude-4.5-sonnet -> claude-4.6-sonnet; fallbacks collapsed to openai-gpt-5.5, gemini-pro (dropped redundant claude-4-sonnet now that claude-4.6-sonnet is primary). - patcher (SWE, RootCause, Reflection, ContextRetriever, code-snippet helper): primary openai-gpt-4.1 -> openai-gpt-5.5; ContextRetriever cheap tier openai-gpt-4.1-mini -> openai-gpt-5.4-mini; fallback claude-4.5-sonnet -> claude-4.6-sonnet. gemini-pro left as-is. Validated together with the litellm temperature fix in this PR via the Docker e2e on example-libpng: seed-gen on claude-4.6-sonnet and patcher on openai-gpt-5.5 both run healthy LLM turns with no errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| api_key: os.environ/OPENAI_API_KEY | ||
| # GPT-5.x reasoning models only accept temperature=1; this pinned litellm | ||
| # (1.57.8) doesn't know these new names, so drop temperature explicitly. | ||
| additional_drop_params: ["temperature"] |
There was a problem hiding this comment.
The same parallel proxy_config block in deployment/k8s/values.yaml (lines 366, 373, 380 for the three gpt-5.x entries, plus the general_settings section at ~452) doesn't get these edits. Per the comment at values.yaml:273 it's a "Direct inclusion of the litellm_config.yaml content" — a manually-synced copy, and commit 4937a76 updated both files together when these models were introduced. As-is the Docker compose path is fixed but Helm deployments still send temperature to gpt-5.x and hit the same HTTP 400 the PR description describes.
Two options:
- Mirror the four edits into
values.yaml: addadditional_drop_params: ["temperature"]to eachopenai-gpt-5.xentry, and add alitellm_settings: drop_params: trueblock next togeneral_settings. - Eliminate the duplication — the litellm-helm chart accepts
proxy_configas a value, so the dev YAML can be the single source via.Files.Get "litellm/litellm_config.yaml"or a small generator step. Worth a follow-up regardless of which path you pick here.
| llm = create_default_llm( | ||
| model_name=ButtercupLLM.OPENAI_GPT_4_1.value, | ||
| fallback_models=[ButtercupLLM.CLAUDE_4_5_SONNET, ButtercupLLM.GEMINI_PRO], | ||
| model_name=ButtercupLLM.OPENAI_GPT_5_5.value, |
There was a problem hiding this comment.
_are_test_instructions_valid is a yes/no validator, and create_default_llm forwards the helper's default temperature=0.1 (common/llm.py:110). After this PR the proxy strips temperature for gpt-5.x, so the primary silently runs at the model default (~1.0) while the Claude/Gemini fallbacks still get 0.1. For a binary validator that's a real consistency hazard — repeated calls on identical input can flip.
rootcause.py:128-135 and swe.py:345-352 already pass "temperature": 1 explicitly in kwargs for their gpt-5.5 targets. Doing the same here, and at common.py:437 (_create_understand_code_snippet_chain), makes the behavior local-readable instead of depending on proxy-side stripping:
llm = create_default_llm(
model_name=ButtercupLLM.OPENAI_GPT_5_5.value,
temperature=1,
fallback_models=[ButtercupLLM.CLAUDE_4_6_SONNET, ButtercupLLM.GEMINI_PRO],
)Since this call is a binary predicate, an alternative is flipping the primary to CLAUDE_4_6_SONNET (which honors low temperature) so the verdict stays deterministic regardless of which model serves the request.
Stacked on #544 (
worktree-add-newer-models), which adds the new model definitions (claude-4.6-sonnet,openai-gpt-5.5,openai-gpt-5.4-mini, …). Base is #544's branch; merge after #544 (or retarget tomainonce #544 lands).1. Bump seed-gen and patcher agents to the newest models
Tier-preserving upgrade onto the models from #544:
claude-4.5-sonnet; fallbacksclaude-4-sonnet, gpt-4.1, gemini-proclaude-4.6-sonnet; fallbacksopenai-gpt-5.5, gemini-proopenai-gpt-4.1; fallbackclaude-4.5-sonnet, gemini-proopenai-gpt-5.5; fallbackclaude-4.6-sonnet, gemini-proopenai-gpt-4.1; cheapopenai-gpt-4.1-miniopenai-gpt-5.5; cheapopenai-gpt-5.4-miniopenai-gpt-4.1/claude-4.5-sonnetopenai-gpt-5.5/claude-4.6-sonnetgemini-proleft as-is (not part of #544). The redundantclaude-4-sonnetseed-gen fallback was dropped sinceclaude-4.6-sonnetis now primary.2. Fix: drop unsupported
temperaturefor GPT-5.x in litellmGPT-5.x reasoning models reject any non-default
temperaturewith HTTP 400 (Unsupported value: 'temperature' ... Only the default (1) value is supported). The common LLM helpers sendtemperature=0.1/0.3, so putting patcher onopenai-gpt-5.5broke every patcher LLM call.A global
litellm_settings: drop_params: trueis not sufficient: the pinnedlitellm 1.57.8doesn't recognize these new model names, so it doesn't knowtemperatureis unsupported and forwards it anyway. Fix adds an explicit, version-independentadditional_drop_params: ["temperature"]to eachopenai-gpt-5.xentry, plus the globaldrop_params: truefor other reasoning models litellm does recognize.Validation
Docker e2e against
example-libpng(low LLM budget):claude-4.6-sonnet: 200+ healthy LLM turns, generated seeds, 0 errors.openai-gpt-5.5: agent loop runs with 0 temperature-400 /unsupported_value/ litellm rejection errors (consistently failed before the litellm fix).🤖 Generated with Claude Code