feat: Detect platform-side inference errors#332
Open
statxc wants to merge 3 commits intoridgesai:mainfrom
Open
feat: Detect platform-side inference errors#332statxc wants to merge 3 commits intoridgesai:mainfrom
statxc wants to merge 3 commits intoridgesai:mainfrom
Conversation
…d for provider failures
Author
|
@camfairchild Could you please review the PR? I'd appreciate any feedbacks. |
camfairchild
requested changes
Mar 12, 2026
inference_gateway/main.py
Outdated
Comment on lines
+121
to
+122
| def is_non_halting_error(status_code: int) -> bool: | ||
| return status_code in NON_HALTING_ERROR_CODES |
Contributor
There was a problem hiding this comment.
Would be better termed "platform error" or something.
By "halting error" I meant one that isn't caught properly and halts the process
Contributor
|
Looks good otherwise. Thank you |
Author
|
@camfairchild Thanks for your feedback. I updated name to |
…d embedding and edge case tests
camfairchild
approved these changes
Mar 12, 2026
Author
|
@ibraheem-abe Could you please review this PR? Welcome to any feedbacks. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Detect platform-side inference errors so agents aren't penalized for provider failures
Closes #331
Problem
When an AI provider goes down or returns server errors (500, 502, etc.), the agent's
inference()calls returnNone. The agent keeps running but produces a bad or empty patch because it has no LLM to work with. The platform then scores this patch normally - the agent gets a 0 for something that wasn't its fault.There was no mechanism to distinguish "the agent wrote bad code" from "the providers were broken."
Solution
Track platform-side inference errors per evaluation run and flag the run as a platform error when the count exceeds a configurable threshold.
Platform errors are provider failures that the agent can't control:
500Internal Server Error502Bad Gateway503Service Unavailable504Gateway Timeout-1Internal provider errorNon-platform errors (400, 404, 422, 429) are excluded - those are the agent's fault (bad request, wrong model, exceeded cost limit).
What changed
inference_gateway/error_hash_map.py(new)ErrorHashMapclass that tracks inference error counts perevaluation_run_id, with the same auto-cleanup pattern as the existingCostHashMap.inference_gateway/config.pyMAX_INFERENCE_ERRORS_PER_EVALUATION_RUN(defaults to 5 if not set in.env). Existing deployments won't break.inference_gateway/main.py503once the threshold is hit. Extended/api/usageto includeinference_errorsandmax_inference_errors. Addedlogger.warning()when errors are counted and when threshold blocks a request.models/evaluation_run.pyPLATFORM_TOO_MANY_INFERENCE_ERRORS = 3050in the 3xxx platform error range.validator/main.py/api/usageon the inference gateway with a 10s timeout. If errors exceed the limit, marks the run as a platform error (3050) instead of scoring the patch. Also wired up theextrafield inEvaluationRunExceptionhandling - it was designed but never passed through. Nowagent_logsare included when reporting platform errors.tests/test_inference_error_tracking.py(new)ErrorHashMapunit behavior, platform error classification, error code validation, and integration tests against both inference and embedding gateway endpoints.How it works end-to-end
Config
Add to your
.envif you want to override the default:Testing