Skip to content

feat(llm): bump seed-gen/patcher to newest models + fix GPT-5.x temperature#553

Open
ret2libc wants to merge 4 commits into
mainfrom
fix/litellm-gpt5-temperature
Open

feat(llm): bump seed-gen/patcher to newest models + fix GPT-5.x temperature#553
ret2libc wants to merge 4 commits into
mainfrom
fix/litellm-gpt5-temperature

Conversation

@ret2libc

@ret2libc ret2libc commented May 15, 2026

Copy link
Copy Markdown
Collaborator

Stacked on #544 (worktree-add-newer-models), which adds the new model definitions (claude-4.6-sonnet, openai-gpt-5.5, openai-gpt-5.4-mini, …). Base is #544's branch; merge after #544 (or retarget to main once #544 lands).

1. Bump seed-gen and patcher agents to the newest models

Tier-preserving upgrade onto the models from #544:

Component Before After
seed-gen (all tasks) primary claude-4.5-sonnet; fallbacks claude-4-sonnet, gpt-4.1, gemini-pro primary claude-4.6-sonnet; fallbacks openai-gpt-5.5, gemini-pro
patcher SWE / RootCause / Reflection primary openai-gpt-4.1; fallback claude-4.5-sonnet, gemini-pro primary openai-gpt-5.5; fallback claude-4.6-sonnet, gemini-pro
patcher ContextRetriever openai-gpt-4.1; cheap openai-gpt-4.1-mini openai-gpt-5.5; cheap openai-gpt-5.4-mini
patcher code-snippet / test-instr helpers openai-gpt-4.1 / claude-4.5-sonnet openai-gpt-5.5 / claude-4.6-sonnet

gemini-pro left as-is (not part of #544). The redundant claude-4-sonnet seed-gen fallback was dropped since claude-4.6-sonnet is now primary.

2. Fix: drop unsupported temperature for GPT-5.x in litellm

GPT-5.x reasoning models reject any non-default temperature with HTTP 400 (Unsupported value: 'temperature' ... Only the default (1) value is supported). The common LLM helpers send temperature=0.1/0.3, so putting patcher on openai-gpt-5.5 broke every patcher LLM call.

A global litellm_settings: drop_params: true is not sufficient: the pinned litellm 1.57.8 doesn't recognize these new model names, so it doesn't know temperature is unsupported and forwards it anyway. Fix adds an explicit, version-independent additional_drop_params: ["temperature"] to each openai-gpt-5.x entry, plus the global drop_params: true for other reasoning models litellm does recognize.

Validation

Docker e2e against example-libpng (low LLM budget):

  • seed-gen on claude-4.6-sonnet: 200+ healthy LLM turns, generated seeds, 0 errors.
  • patcher on openai-gpt-5.5: agent loop runs with 0 temperature-400 / unsupported_value / litellm rejection errors (consistently failed before the litellm fix).

🤖 Generated with Claude Code

ret2libc and others added 3 commits May 14, 2026 16:07
Add 9 model entries across the LiteLLM proxy config, k8s values, and
ButtercupLLM enum so callers can opt into newer models without code
changes: openai-o4-mini, openai-gpt-5/-mini/-nano, claude-4-opus,
claude-4.1-opus, claude-4.5-haiku, claude-4.6-sonnet, claude-4.7-opus.

Verified end-to-end with docker compose: LiteLLM v1.57.8 boots cleanly,
/v1/models lists all 27 entries, and chat completions for the new
model IDs are dispatched to api.anthropic.com / api.openai.com.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…5 + gpt-5.4/5.4-mini/5.5

Replace the previously added openai-o4-mini and gpt-5/-mini/-nano entries
with gpt-5.4, gpt-5.4-mini, gpt-5.5; drop claude opus 4 and 4.1 in favor
of opus 4.6. Final new entries (7): claude-4.5-haiku, claude-4.6-sonnet,
claude-4.6-opus, claude-4.7-opus, openai-gpt-5.4-mini, openai-gpt-5.4,
openai-gpt-5.5.

Re-verified with docker compose: LiteLLM v1.57.8 boots, /v1/models lists
all 25 entries, and chat completions for claude-4.6-opus and
openai-gpt-5.5 are dispatched to api.anthropic.com / api.openai.com.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GPT-5.x reasoning models (gpt-5.4, gpt-5.4-mini, gpt-5.5) reject any
non-default temperature with HTTP 400 ("Unsupported value: 'temperature'
... Only the default (1) value is supported"). The codebase sends
temperature=0.1/0.3 via the common LLM helpers, so every component routed
to these models fails.

A global `litellm_settings: drop_params: true` is not sufficient: the
pinned litellm (1.57.8) does not recognize these new model names, so it
does not know temperature is unsupported and forwards it anyway. Add an
explicit, version-independent `additional_drop_params: ["temperature"]`
to each gpt-5.x model entry, plus the global drop_params for any other
reasoning models litellm does recognize.

Verified via the Docker e2e (example-libpng): patcher on openai-gpt-5.5
runs with 0 temperature-400 errors after this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ret2libc ret2libc requested a review from hbrodin as a code owner May 15, 2026 14:37
@ret2libc ret2libc requested a review from reytchison May 15, 2026 14:39
Tier-preserving upgrade onto the models added in #544:
- seed-gen: primary claude-4.5-sonnet -> claude-4.6-sonnet; fallbacks
  collapsed to openai-gpt-5.5, gemini-pro (dropped redundant claude-4-sonnet
  now that claude-4.6-sonnet is primary).
- patcher (SWE, RootCause, Reflection, ContextRetriever, code-snippet
  helper): primary openai-gpt-4.1 -> openai-gpt-5.5; ContextRetriever
  cheap tier openai-gpt-4.1-mini -> openai-gpt-5.4-mini; fallback
  claude-4.5-sonnet -> claude-4.6-sonnet. gemini-pro left as-is.

Validated together with the litellm temperature fix in this PR via the
Docker e2e on example-libpng: seed-gen on claude-4.6-sonnet and patcher
on openai-gpt-5.5 both run healthy LLM turns with no errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ret2libc ret2libc changed the title fix(litellm): drop unsupported temperature for GPT-5.x models feat(llm): bump seed-gen/patcher to newest models + fix GPT-5.x temperature May 15, 2026
Base automatically changed from worktree-add-newer-models to main June 16, 2026 08:25
api_key: os.environ/OPENAI_API_KEY
# GPT-5.x reasoning models only accept temperature=1; this pinned litellm
# (1.57.8) doesn't know these new names, so drop temperature explicitly.
additional_drop_params: ["temperature"]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same parallel proxy_config block in deployment/k8s/values.yaml (lines 366, 373, 380 for the three gpt-5.x entries, plus the general_settings section at ~452) doesn't get these edits. Per the comment at values.yaml:273 it's a "Direct inclusion of the litellm_config.yaml content" — a manually-synced copy, and commit 4937a76 updated both files together when these models were introduced. As-is the Docker compose path is fixed but Helm deployments still send temperature to gpt-5.x and hit the same HTTP 400 the PR description describes.

Two options:

  1. Mirror the four edits into values.yaml: add additional_drop_params: ["temperature"] to each openai-gpt-5.x entry, and add a litellm_settings: drop_params: true block next to general_settings.
  2. Eliminate the duplication — the litellm-helm chart accepts proxy_config as a value, so the dev YAML can be the single source via .Files.Get "litellm/litellm_config.yaml" or a small generator step. Worth a follow-up regardless of which path you pick here.

llm = create_default_llm(
model_name=ButtercupLLM.OPENAI_GPT_4_1.value,
fallback_models=[ButtercupLLM.CLAUDE_4_5_SONNET, ButtercupLLM.GEMINI_PRO],
model_name=ButtercupLLM.OPENAI_GPT_5_5.value,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_are_test_instructions_valid is a yes/no validator, and create_default_llm forwards the helper's default temperature=0.1 (common/llm.py:110). After this PR the proxy strips temperature for gpt-5.x, so the primary silently runs at the model default (~1.0) while the Claude/Gemini fallbacks still get 0.1. For a binary validator that's a real consistency hazard — repeated calls on identical input can flip.

rootcause.py:128-135 and swe.py:345-352 already pass "temperature": 1 explicitly in kwargs for their gpt-5.5 targets. Doing the same here, and at common.py:437 (_create_understand_code_snippet_chain), makes the behavior local-readable instead of depending on proxy-side stripping:

llm = create_default_llm(
    model_name=ButtercupLLM.OPENAI_GPT_5_5.value,
    temperature=1,
    fallback_models=[ButtercupLLM.CLAUDE_4_6_SONNET, ButtercupLLM.GEMINI_PRO],
)

Since this call is a binary predicate, an alternative is flipping the primary to CLAUDE_4_6_SONNET (which honors low temperature) so the verdict stays deterministic regardless of which model serves the request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants