Fix structured output format and temperature handling for OpenAI evaluator models by JunYeopLee · Pull Request #16 · scaleapi/mcp-atlas

JunYeopLee · 2026-03-03T07:33:14Z

Summary

When using an OpenAI model (e.g. openai/gpt-5.1) as the evaluator via --evaluator-model, the scoring script fails with:

litellm.BadRequestError: OpenAIException - Unknown parameter: 'response_format.response_schema'

This happens because response_schema inside response_format is a Gemini-specific parameter that the OpenAI API does not recognize.

This PR fixes the issue by:

Provider-aware response_format: Uses OpenAI's native json_schema structured output format (with strict: true) for OpenAI models, and preserves the existing json_object + response_schema format for Gemini and other providers.
Proper temperature handling for reasoning models: Instead of hard-coding temperature=1 only for gpt-5, the temperature parameter is now omitted entirely for all OpenAI reasoning model families (o1, o3, o4, gpt-5*), since these models reject any explicit temperature value.
Enable litellm.drop_params = True: Acts as a safety net so that any unsupported parameters are silently dropped rather than causing request failures.

Reproduction

uv run mcp_evals_scores.py \
  --input-file="completion_results/sample_51_results.csv" \
  --evaluator-model="openai/gpt-5.1" \
  --model-label="gpt51"

Before this fix, every scoring request fails immediately. After this fix, OpenAI models work correctly as evaluators.

Test plan

Verified openai/gpt-5.1 as evaluator no longer produces BadRequestError
Verified Gemini models (gemini/gemini-2.5-pro) still work as before (no behavior change)
Confirmed litellm.drop_params = True is set as a fallback safety net

The evaluator's `response_format` used `response_schema` (Gemini-specific), which causes `BadRequestError: Unknown parameter 'response_format.response_schema'` when using OpenAI models (e.g. gpt-5.1) as the evaluator. Changes: - Use OpenAI's `json_schema` structured output format when the evaluator model is an OpenAI model, and keep the existing `json_object` + `response_schema` format for Gemini and other providers. - Omit the `temperature` parameter entirely for OpenAI reasoning models (o1, o3, o4, gpt-5 families) instead of hard-coding it to 1, since these models reject any explicit temperature value. - Enable `litellm.drop_params = True` as a safety net so unsupported parameters are silently dropped rather than causing request failures. Made-with: Cursor

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-03-03T07:35:34Z

-                    response_format={
+                # Build response_format based on provider
+                model_name = self.config.evaluator_model.lower()
+                is_openai = model_name.startswith("openai/") or model_name.startswith("gpt-")


is_openai misses unprefixed OpenAI reasoning models

LiteLLM routes bare model names like o1, o3, o4-mini to OpenAI automatically, but this check only looks for "openai/" or "gpt-" prefixes. If a user passes --evaluator-model o3, is_openai will be False, causing the Gemini-style response_format to be sent to OpenAI — triggering the same BadRequestError this PR aims to fix.

Consider also matching the reasoning prefixes in the is_openai check:

Suggested change

is_openai = model_name.startswith("openai/") or model_name.startswith("gpt-")

is_openai = model_name.startswith("openai/") or model_name.startswith("gpt-") or model_name.startswith(("o1", "o3", "o4"))

Prompt To Fix With AI

This is a comment left during a code review. Path: services/mcp_eval/mcp_evals_scores.py Line: 366 Comment: **`is_openai` misses unprefixed OpenAI reasoning models** LiteLLM routes bare model names like `o1`, `o3`, `o4-mini` to OpenAI automatically, but this check only looks for `"openai/"` or `"gpt-"` prefixes. If a user passes `--evaluator-model o3`, `is_openai` will be `False`, causing the Gemini-style `response_format` to be sent to OpenAI — triggering the same `BadRequestError` this PR aims to fix. Consider also matching the reasoning prefixes in the `is_openai` check: ```suggestion is_openai = model_name.startswith("openai/") or model_name.startswith("gpt-") or model_name.startswith(("o1", "o3", "o4")) ``` How can I resolve this? If you propose a fix, please make it concise.

LiteLLM routes unprefixed names like o1, o3, o4-mini to OpenAI automatically, so is_openai must match them to avoid sending the Gemini-style response_format to OpenAI. Made-with: Cursor

greptile-apps bot reviewed Mar 3, 2026

View reviewed changes

Include bare OpenAI reasoning model names in is_openai check

e53f67e

LiteLLM routes unprefixed names like o1, o3, o4-mini to OpenAI automatically, so is_openai must match them to avoid sending the Gemini-style response_format to OpenAI. Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix structured output format and temperature handling for OpenAI evaluator models#16

Fix structured output format and temperature handling for OpenAI evaluator models#16
JunYeopLee wants to merge 2 commits intoscaleapi:mainfrom
JunYeopLee:fix/openai-structured-output-and-reasoning-temperature

JunYeopLee commented Mar 3, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	is_openai = model_name.startswith("openai/") or model_name.startswith("gpt-")
	is_openai = model_name.startswith("openai/") or model_name.startswith("gpt-") or model_name.startswith(("o1", "o3", "o4"))

Conversation

JunYeopLee commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reproduction

Test plan

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JunYeopLee commented Mar 3, 2026 •

edited

Loading