feat(llm): bump seed-gen/patcher to newest models + fix GPT-5.x temperature by ret2libc · Pull Request #553 · trailofbits/buttercup

ret2libc · 2026-05-15T14:37:24Z

Stacked on #544 (worktree-add-newer-models), which adds the new model definitions (claude-4.6-sonnet, openai-gpt-5.5, openai-gpt-5.4-mini, …). Base is #544's branch; merge after #544 (or retarget to main once #544 lands).

1. Bump seed-gen and patcher agents to the newest models

Tier-preserving upgrade onto the models from #544:

Component	Before	After
seed-gen (all tasks)	primary `claude-4.5-sonnet`; fallbacks `claude-4-sonnet, gpt-4.1, gemini-pro`	primary `claude-4.6-sonnet`; fallbacks `openai-gpt-5.5, gemini-pro`
patcher SWE / RootCause / Reflection	primary `openai-gpt-4.1`; fallback `claude-4.5-sonnet, gemini-pro`	primary `openai-gpt-5.5`; fallback `claude-4.6-sonnet, gemini-pro`
patcher ContextRetriever	`openai-gpt-4.1`; cheap `openai-gpt-4.1-mini`	`openai-gpt-5.5`; cheap `openai-gpt-5.4-mini`
patcher code-snippet / test-instr helpers	`openai-gpt-4.1` / `claude-4.5-sonnet`	`openai-gpt-5.5` / `claude-4.6-sonnet`

gemini-pro left as-is (not part of #544). The redundant claude-4-sonnet seed-gen fallback was dropped since claude-4.6-sonnet is now primary.

2. Fix: drop unsupported `temperature` for GPT-5.x in litellm

GPT-5.x reasoning models reject any non-default temperature with HTTP 400 (Unsupported value: 'temperature' ... Only the default (1) value is supported). The common LLM helpers send temperature=0.1/0.3, so putting patcher on openai-gpt-5.5 broke every patcher LLM call.

A global litellm_settings: drop_params: true is not sufficient: the pinned litellm 1.57.8 doesn't recognize these new model names, so it doesn't know temperature is unsupported and forwards it anyway. Fix adds an explicit, version-independent additional_drop_params: ["temperature"] to each openai-gpt-5.x entry, plus the global drop_params: true for other reasoning models litellm does recognize.

Validation

Docker e2e against example-libpng (low LLM budget):

seed-gen on claude-4.6-sonnet: 200+ healthy LLM turns, generated seeds, 0 errors.
patcher on openai-gpt-5.5: agent loop runs with 0 temperature-400 / unsupported_value / litellm rejection errors (consistently failed before the litellm fix).

🤖 Generated with Claude Code

Add 9 model entries across the LiteLLM proxy config, k8s values, and ButtercupLLM enum so callers can opt into newer models without code changes: openai-o4-mini, openai-gpt-5/-mini/-nano, claude-4-opus, claude-4.1-opus, claude-4.5-haiku, claude-4.6-sonnet, claude-4.7-opus. Verified end-to-end with docker compose: LiteLLM v1.57.8 boots cleanly, /v1/models lists all 27 entries, and chat completions for the new model IDs are dispatched to api.anthropic.com / api.openai.com. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…5 + gpt-5.4/5.4-mini/5.5 Replace the previously added openai-o4-mini and gpt-5/-mini/-nano entries with gpt-5.4, gpt-5.4-mini, gpt-5.5; drop claude opus 4 and 4.1 in favor of opus 4.6. Final new entries (7): claude-4.5-haiku, claude-4.6-sonnet, claude-4.6-opus, claude-4.7-opus, openai-gpt-5.4-mini, openai-gpt-5.4, openai-gpt-5.5. Re-verified with docker compose: LiteLLM v1.57.8 boots, /v1/models lists all 25 entries, and chat completions for claude-4.6-opus and openai-gpt-5.5 are dispatched to api.anthropic.com / api.openai.com. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

GPT-5.x reasoning models (gpt-5.4, gpt-5.4-mini, gpt-5.5) reject any non-default temperature with HTTP 400 ("Unsupported value: 'temperature' ... Only the default (1) value is supported"). The codebase sends temperature=0.1/0.3 via the common LLM helpers, so every component routed to these models fails. A global `litellm_settings: drop_params: true` is not sufficient: the pinned litellm (1.57.8) does not recognize these new model names, so it does not know temperature is unsupported and forwards it anyway. Add an explicit, version-independent `additional_drop_params: ["temperature"]` to each gpt-5.x model entry, plus the global drop_params for any other reasoning models litellm does recognize. Verified via the Docker e2e (example-libpng): patcher on openai-gpt-5.5 runs with 0 temperature-400 errors after this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tier-preserving upgrade onto the models added in #544: - seed-gen: primary claude-4.5-sonnet -> claude-4.6-sonnet; fallbacks collapsed to openai-gpt-5.5, gemini-pro (dropped redundant claude-4-sonnet now that claude-4.6-sonnet is primary). - patcher (SWE, RootCause, Reflection, ContextRetriever, code-snippet helper): primary openai-gpt-4.1 -> openai-gpt-5.5; ContextRetriever cheap tier openai-gpt-4.1-mini -> openai-gpt-5.4-mini; fallback claude-4.5-sonnet -> claude-4.6-sonnet. gemini-pro left as-is. Validated together with the litellm temperature fix in this PR via the Docker e2e on example-libpng: seed-gen on claude-4.6-sonnet and patcher on openai-gpt-5.5 both run healthy LLM turns with no errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hbrodin · 2026-06-16T08:34:24Z

+      api_key: os.environ/OPENAI_API_KEY
+      # GPT-5.x reasoning models only accept temperature=1; this pinned litellm
+      # (1.57.8) doesn't know these new names, so drop temperature explicitly.
+      additional_drop_params: ["temperature"]


The same parallel proxy_config block in deployment/k8s/values.yaml (lines 366, 373, 380 for the three gpt-5.x entries, plus the general_settings section at ~452) doesn't get these edits. Per the comment at values.yaml:273 it's a "Direct inclusion of the litellm_config.yaml content" — a manually-synced copy, and commit 4937a76 updated both files together when these models were introduced. As-is the Docker compose path is fixed but Helm deployments still send temperature to gpt-5.x and hit the same HTTP 400 the PR description describes.

Two options:

Mirror the four edits into values.yaml: add additional_drop_params: ["temperature"] to each openai-gpt-5.x entry, and add a litellm_settings: drop_params: true block next to general_settings.

Eliminate the duplication — the litellm-helm chart accepts proxy_config as a value, so the dev YAML can be the single source via .Files.Get "litellm/litellm_config.yaml" or a small generator step. Worth a follow-up regardless of which path you pick here.

hbrodin · 2026-06-16T08:34:27Z

    llm = create_default_llm(
-        model_name=ButtercupLLM.OPENAI_GPT_4_1.value,
-        fallback_models=[ButtercupLLM.CLAUDE_4_5_SONNET, ButtercupLLM.GEMINI_PRO],
+        model_name=ButtercupLLM.OPENAI_GPT_5_5.value,


_are_test_instructions_valid is a yes/no validator, and create_default_llm forwards the helper's default temperature=0.1 (common/llm.py:110). After this PR the proxy strips temperature for gpt-5.x, so the primary silently runs at the model default (~1.0) while the Claude/Gemini fallbacks still get 0.1. For a binary validator that's a real consistency hazard — repeated calls on identical input can flip.

rootcause.py:128-135 and swe.py:345-352 already pass "temperature": 1 explicitly in kwargs for their gpt-5.5 targets. Doing the same here, and at common.py:437 (_create_understand_code_snippet_chain), makes the behavior local-readable instead of depending on proxy-side stripping:

llm = create_default_llm( model_name=ButtercupLLM.OPENAI_GPT_5_5.value, temperature=1, fallback_models=[ButtercupLLM.CLAUDE_4_6_SONNET, ButtercupLLM.GEMINI_PRO], )

Since this call is a binary predicate, an alternative is flipping the primary to CLAUDE_4_6_SONNET (which honors low temperature) so the verdict stays deterministic regardless of which model serves the request.

ret2libc and others added 3 commits May 14, 2026 16:07

ret2libc requested a review from hbrodin as a code owner May 15, 2026 14:37

ret2libc requested a review from reytchison May 15, 2026 14:39

ret2libc changed the title ~~fix(litellm): drop unsupported temperature for GPT-5.x models~~ feat(llm): bump seed-gen/patcher to newest models + fix GPT-5.x temperature May 15, 2026

Base automatically changed from worktree-add-newer-models to main June 16, 2026 08:25

hbrodin reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): bump seed-gen/patcher to newest models + fix GPT-5.x temperature#553

feat(llm): bump seed-gen/patcher to newest models + fix GPT-5.x temperature#553
ret2libc wants to merge 4 commits into
mainfrom
fix/litellm-gpt5-temperature

ret2libc commented May 15, 2026 •

edited

Loading

Uh oh!

hbrodin Jun 16, 2026

Uh oh!

hbrodin Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ret2libc commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Bump seed-gen and patcher agents to the newest models

2. Fix: drop unsupported temperature for GPT-5.x in litellm

Validation

Uh oh!

hbrodin Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

hbrodin Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ret2libc commented May 15, 2026 •

edited

Loading

2. Fix: drop unsupported `temperature` for GPT-5.x in litellm