Skip to content

Add benchmark results for openai/gpt-5.5-pro#625

Closed
github-actions[bot] wants to merge 1 commit into
mainfrom
benchmark/openai-gpt-5-5-pro-27751159939
Closed

Add benchmark results for openai/gpt-5.5-pro#625
github-actions[bot] wants to merge 1 commit into
mainfrom
benchmark/openai-gpt-5-5-pro-27751159939

Conversation

@github-actions

Copy link
Copy Markdown
Contributor

This PR adds benchmark results for the openai/gpt-5.5-pro model.

Results have been pushed to Tinybird with validated=0 (pending review).
Merging this PR will validate the results, making them visible on the production dashboard.

This PR was automatically generated by the benchmark workflow.

Note: If you don't want to merge this PR, close it and the model will be added to the failed list.

/cc pei-tinybird,gnzjgo

@github-actions

Copy link
Copy Markdown
Contributor Author

Automated Benchmark Review

Model: openai/gpt-5.5-pro
Review model: openai/gpt-5.4-nano
Success rate: 20.0%


  1. Quality summary: The run shows low overall reliability: only 10/50 requests succeeded (20% success rate). While latency and execution time are not extreme (avg latency 173 ms, avg execution time 2.16 s), the high error rate suggests frequent failures rather than performance degradation.

  2. Concerns/anomalies: The primary anomaly is the very high error count (40/50) with firstAttemptRate = successRate = 20%, indicating no meaningful recovery on retries. This points to consistent issues (e.g., prompt/format mismatch, integration failures, rate limits, or model/tooling errors) that warrant investigation before relying on these results.

  3. Recommendation: REVIEW.
    Recommendation: REVIEW


This review was automatically generated. Set AUTO_MERGE=false in repo variables to disable auto-merge.

@vercel

vercel Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
llm-benchmark Ready Ready Preview, Comment Jun 18, 2026 10:04am

@github-actions

Copy link
Copy Markdown
Contributor Author

Model Rejected

The model openai/gpt-5.5-pro has been added to the failed models list.

What this means:

  • This model is tracked as a rejected benchmark
  • Discovery will still find it, but it won't be re-benchmarked automatically while in the failed list

To manage this model:

# Remove from failed list (to allow re-benchmarking)
npm run manage-failed-models remove openai gpt-5.5-pro

# List all failed models
npm run manage-failed-models list

@alrocar

github-actions Bot pushed a commit that referenced this pull request Jun 18, 2026
- Model: openai/gpt-5.5-pro
- Reason: PR rejected/closed without merging
- PR: #625

This prevents the model from being automatically benchmarked again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants