Skip to content

feat: Detect platform-side inference errors#332

Open
statxc wants to merge 3 commits intoridgesai:mainfrom
statxc:feat/platform-inference-error-detection
Open

feat: Detect platform-side inference errors#332
statxc wants to merge 3 commits intoridgesai:mainfrom
statxc:feat/platform-inference-error-detection

Conversation

@statxc
Copy link

@statxc statxc commented Mar 12, 2026

Detect platform-side inference errors so agents aren't penalized for provider failures

Closes #331

Problem

When an AI provider goes down or returns server errors (500, 502, etc.), the agent's inference() calls return None. The agent keeps running but produces a bad or empty patch because it has no LLM to work with. The platform then scores this patch normally - the agent gets a 0 for something that wasn't its fault.

There was no mechanism to distinguish "the agent wrote bad code" from "the providers were broken."

Solution

Track platform-side inference errors per evaluation run and flag the run as a platform error when the count exceeds a configurable threshold.

Platform errors are provider failures that the agent can't control:

  • 500 Internal Server Error
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Timeout
  • -1 Internal provider error

Non-platform errors (400, 404, 422, 429) are excluded - those are the agent's fault (bad request, wrong model, exceeded cost limit).

What changed

File What changed
inference_gateway/error_hash_map.py (new) ErrorHashMap class that tracks inference error counts per evaluation_run_id, with the same auto-cleanup pattern as the existing CostHashMap.
inference_gateway/config.py Added MAX_INFERENCE_ERRORS_PER_EVALUATION_RUN (defaults to 5 if not set in .env). Existing deployments won't break.
inference_gateway/main.py Counts platform errors after each inference/embedding call. Blocks further requests with 503 once the threshold is hit. Extended /api/usage to include inference_errors and max_inference_errors. Added logger.warning() when errors are counted and when threshold blocks a request.
models/evaluation_run.py Added PLATFORM_TOO_MANY_INFERENCE_ERRORS = 3050 in the 3xxx platform error range.
validator/main.py After agent finishes, queries /api/usage on the inference gateway with a 10s timeout. If errors exceed the limit, marks the run as a platform error (3050) instead of scoring the patch. Also wired up the extra field in EvaluationRunException handling - it was designed but never passed through. Now agent_logs are included when reporting platform errors.
tests/test_inference_error_tracking.py (new) 19 tests covering ErrorHashMap unit behavior, platform error classification, error code validation, and integration tests against both inference and embedding gateway endpoints.

How it works end-to-end

Agent calls inference() → provider returns 500 → gateway counts error → agent gets None
Agent calls inference() → provider returns 500 → gateway counts error → agent gets None
...5th error...
Agent calls inference() → gateway returns 503 (blocked) → agent finishes with bad patch
Validator checks /api/usage → sees errors >= limit → marks run as PLATFORM error (3050)
→ Agent is not scored on this run

Config

Add to your .env if you want to override the default:

MAX_INFERENCE_ERRORS_PER_EVALUATION_RUN=5

Testing

python3 -m pytest tests/test_inference_error_tracking.py -v

@statxc
Copy link
Author

statxc commented Mar 12, 2026

@camfairchild Could you please review the PR? I'd appreciate any feedbacks.

Comment on lines +121 to +122
def is_non_halting_error(status_code: int) -> bool:
return status_code in NON_HALTING_ERROR_CODES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better termed "platform error" or something.

By "halting error" I meant one that isn't caught properly and halts the process

@camfairchild
Copy link
Contributor

Looks good otherwise. Thank you

@statxc
Copy link
Author

statxc commented Mar 12, 2026

@camfairchild Thanks for your feedback. I updated name to platform error. Could you review again?

@statxc statxc requested a review from camfairchild March 12, 2026 14:12
@statxc
Copy link
Author

statxc commented Mar 12, 2026

@ibraheem-abe Could you please review this PR? Welcome to any feedbacks.
thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[subnet] improve detection of platform-side inference errors

2 participants