feat: adds ability to use inverted judges#168
feat: adds ability to use inverted judges#168andrewklatzke wants to merge 2 commits intoaklatzke/AIC-2263/sdk-dx-improvementsfrom
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a8f14de. Configure here.
| score = result.score | ||
| if optimization_judge.threshold is not None: | ||
| passed = score >= optimization_judge.threshold | ||
| passed = judge_passed(score, optimization_judge.threshold, optimization_judge.is_inverted) |
There was a problem hiding this comment.
Inverted judge logic missed in fallback threshold branch
Medium Severity
In variation_prompt_feedback, the else branch (when optimization_judge.threshold is None) still uses passed = score >= 1.0 instead of calling judge_passed(score, 1.0, optimization_judge.is_inverted). For inverted judges hitting this path, a low score (which should pass) would be marked as FAILED, and only a perfect 1.0 would pass — the exact opposite of the intended behavior. The other two equivalent locations in client.py correctly default to 1.0 and pass it through judge_passed.
Reviewed by Cursor Bugbot for commit a8f14de. Configure here.


Requirements
Describe the solution you've provided
Implements handling for "inverted" judges.
Describe alternatives you've considered
This gets feature parity with our online evals functionality; no alternatives considered.
Additional context
When a metric has
is_invertedset, it's intended that the evaluation of the score flips from>=to<=. This adds a util_judge_passedto handle that logic and implements it throughout. We don't surface the inverted property in the SDK, so we fetch the judge directly to get this information.Note
Medium Risk
Changes core judge pass/fail semantics and adds per-judge REST calls (
get_ai_config) during config-driven runs, which could affect optimization outcomes and introduce new failure/performance modes if the API is unavailable or slow.Overview
Adds first-class support for inverted judges (where lower scores are better) by introducing a shared
judge_passedhelper and using it for pass/fail decisions inOptimizationClientand in prompt feedback generation.Extends
OptimizationJudgewith anis_invertedflag and, foroptimize_from_config, fetches each judge’sisInvertedvalue viaapi_client.get_ai_configwhen building options. Updates logging to include the inverted status, and adds targeted tests covering the helper, mixed inverted/standard evaluation, config building behavior, andvariation_prompt_feedbackoutput.Reviewed by Cursor Bugbot for commit a8f14de. Bugbot is set up for automated code reviews on this repo. Configure here.