Merged
Conversation
| except Exception: | ||
| pass | ||
|
|
||
| return 0.0 |
There was a problem hiding this comment.
Bug: Unhandled Exceptions Propagate, Crashing Evaluation
The call to _ep_eval on line 152 is outside the try-except block, so exceptions raised by the user's evaluation function will not be caught. Only normalization errors are caught, allowing crashes from the evaluation logic to propagate instead of gracefully returning 0.0 as intended.
dphuang2
reviewed
Nov 18, 2025
dphuang2
reviewed
Nov 18, 2025
dphuang2
reviewed
Nov 18, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Adds OpenAI RFT integration to convert Eval Protocol evaluation functions into Python grader specs, plus example usage and tests.
eval_protocol/integrations/openai_rft.pywithbuild_python_grader_from_evaluation_testto convert evaluation-style functions into OpenAI Python grader specs by:_ep_eval, and embedding helper types (EvaluationRow,EvaluateResult,Message).grade(sample, item)that mapsitem/sampleto a duck-typed row and normalizes outputs to a float score.eval_protocol/integrations/__init__.py.examples/openai_rft/example_rapidfuzz.py: demo@evaluation_testusing RapidFuzz and conversion to grader.examples/openai_rft/test_openai_grader.py: script to validate/run the grader via OpenAI API.tests/test_openai_rft_integration.py: verifies grader generation from plain and wrapped functions and correct scoring behavior.Written by Cursor Bugbot for commit 6fc1c36. This will update automatically on new commits. Configure here.