Skip to content

Fix structured output format and temperature handling for OpenAI evaluator models#16

Open
JunYeopLee wants to merge 2 commits intoscaleapi:mainfrom
JunYeopLee:fix/openai-structured-output-and-reasoning-temperature
Open

Fix structured output format and temperature handling for OpenAI evaluator models#16
JunYeopLee wants to merge 2 commits intoscaleapi:mainfrom
JunYeopLee:fix/openai-structured-output-and-reasoning-temperature

Conversation

@JunYeopLee
Copy link
Copy Markdown

@JunYeopLee JunYeopLee commented Mar 3, 2026

Summary

When using an OpenAI model (e.g. openai/gpt-5.1) as the evaluator via --evaluator-model, the scoring script fails with:

litellm.BadRequestError: OpenAIException - Unknown parameter: 'response_format.response_schema'

This happens because response_schema inside response_format is a Gemini-specific parameter that the OpenAI API does not recognize.

This PR fixes the issue by:

  • Provider-aware response_format: Uses OpenAI's native json_schema structured output format (with strict: true) for OpenAI models, and preserves the existing json_object + response_schema format for Gemini and other providers.
  • Proper temperature handling for reasoning models: Instead of hard-coding temperature=1 only for gpt-5, the temperature parameter is now omitted entirely for all OpenAI reasoning model families (o1, o3, o4, gpt-5*), since these models reject any explicit temperature value.
  • Enable litellm.drop_params = True: Acts as a safety net so that any unsupported parameters are silently dropped rather than causing request failures.

Reproduction

uv run mcp_evals_scores.py \
  --input-file="completion_results/sample_51_results.csv" \
  --evaluator-model="openai/gpt-5.1" \
  --model-label="gpt51"

Before this fix, every scoring request fails immediately. After this fix, OpenAI models work correctly as evaluators.

Test plan

  • Verified openai/gpt-5.1 as evaluator no longer produces BadRequestError
  • Verified Gemini models (gemini/gemini-2.5-pro) still work as before (no behavior change)
  • Confirmed litellm.drop_params = True is set as a fallback safety net

The evaluator's `response_format` used `response_schema` (Gemini-specific),
which causes `BadRequestError: Unknown parameter 'response_format.response_schema'`
when using OpenAI models (e.g. gpt-5.1) as the evaluator.

Changes:
- Use OpenAI's `json_schema` structured output format when the evaluator model
  is an OpenAI model, and keep the existing `json_object` + `response_schema`
  format for Gemini and other providers.
- Omit the `temperature` parameter entirely for OpenAI reasoning models
  (o1, o3, o4, gpt-5 families) instead of hard-coding it to 1, since these
  models reject any explicit temperature value.
- Enable `litellm.drop_params = True` as a safety net so unsupported
  parameters are silently dropped rather than causing request failures.

Made-with: Cursor
Copy link
Copy Markdown

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment thread services/mcp_eval/mcp_evals_scores.py Outdated
response_format={
# Build response_format based on provider
model_name = self.config.evaluator_model.lower()
is_openai = model_name.startswith("openai/") or model_name.startswith("gpt-")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_openai misses unprefixed OpenAI reasoning models

LiteLLM routes bare model names like o1, o3, o4-mini to OpenAI automatically, but this check only looks for "openai/" or "gpt-" prefixes. If a user passes --evaluator-model o3, is_openai will be False, causing the Gemini-style response_format to be sent to OpenAI — triggering the same BadRequestError this PR aims to fix.

Consider also matching the reasoning prefixes in the is_openai check:

Suggested change
is_openai = model_name.startswith("openai/") or model_name.startswith("gpt-")
is_openai = model_name.startswith("openai/") or model_name.startswith("gpt-") or model_name.startswith(("o1", "o3", "o4"))
Prompt To Fix With AI
This is a comment left during a code review.
Path: services/mcp_eval/mcp_evals_scores.py
Line: 366

Comment:
**`is_openai` misses unprefixed OpenAI reasoning models**

LiteLLM routes bare model names like `o1`, `o3`, `o4-mini` to OpenAI automatically, but this check only looks for `"openai/"` or `"gpt-"` prefixes. If a user passes `--evaluator-model o3`, `is_openai` will be `False`, causing the Gemini-style `response_format` to be sent to OpenAI — triggering the same `BadRequestError` this PR aims to fix.

Consider also matching the reasoning prefixes in the `is_openai` check:

```suggestion
                is_openai = model_name.startswith("openai/") or model_name.startswith("gpt-") or model_name.startswith(("o1", "o3", "o4"))
```

How can I resolve this? If you propose a fix, please make it concise.

LiteLLM routes unprefixed names like o1, o3, o4-mini to OpenAI
automatically, so is_openai must match them to avoid sending
the Gemini-style response_format to OpenAI.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant