Add benchmark results for openai/gpt-5.5-pro by github-actions[bot] · Pull Request #625 · tinybirdco/llm-benchmark

github-actions · 2026-06-18T10:04:23Z

This PR adds benchmark results for the openai/gpt-5.5-pro model.

Results have been pushed to Tinybird with validated=0 (pending review).
Merging this PR will validate the results, making them visible on the production dashboard.

This PR was automatically generated by the benchmark workflow.

Note: If you don't want to merge this PR, close it and the model will be added to the failed list.

/cc pei-tinybird,gnzjgo

github-actions · 2026-06-18T10:04:29Z

Automated Benchmark Review

Model: openai/gpt-5.5-pro
Review model: openai/gpt-5.4-nano
Success rate: 20.0%

Quality summary: The run shows low overall reliability: only 10/50 requests succeeded (20% success rate). While latency and execution time are not extreme (avg latency 173 ms, avg execution time 2.16 s), the high error rate suggests frequent failures rather than performance degradation.
Concerns/anomalies: The primary anomaly is the very high error count (40/50) with firstAttemptRate = successRate = 20%, indicating no meaningful recovery on retries. This points to consistent issues (e.g., prompt/format mismatch, integration failures, rate limits, or model/tooling errors) that warrant investigation before relying on these results.
Recommendation: REVIEW.
Recommendation: REVIEW

This review was automatically generated. Set AUTO_MERGE=false in repo variables to disable auto-merge.

vercel · 2026-06-18T10:04:29Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
llm-benchmark	Ready	Preview, Comment	Jun 18, 2026 10:04am

github-actions · 2026-06-18T10:32:59Z

Model Rejected

The model openai/gpt-5.5-pro has been added to the failed models list.

What this means:

This model is tracked as a rejected benchmark
Discovery will still find it, but it won't be re-benchmarked automatically while in the failed list

To manage this model:

# Remove from failed list (to allow re-benchmarking)
npm run manage-failed-models remove openai gpt-5.5-pro

# List all failed models
npm run manage-failed-models list

@alrocar

- Model: openai/gpt-5.5-pro - Reason: PR rejected/closed without merging - PR: #625 This prevents the model from being automatically benchmarked again.

feat: add benchmark results for openai/gpt-5.5-pro

d11175c

github-actions Bot assigned pei-tinybird and gnzjgo Jun 18, 2026

vercel Bot deployed to Preview June 18, 2026 10:04 View deployment

pei-tinybird closed this Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark results for openai/gpt-5.5-pro#625

Add benchmark results for openai/gpt-5.5-pro#625
github-actions[bot] wants to merge 1 commit into
mainfrom
benchmark/openai-gpt-5-5-pro-27751159939

github-actions Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

vercel Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

github-actions Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Automated Benchmark Review

Uh oh!

vercel Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026

Model Rejected

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel Bot commented Jun 18, 2026 •

edited

Loading