Skip to content

Commit 935beeb

Browse files
committed
move to not be in package
1 parent 2dccccf commit 935beeb

File tree

4 files changed

+21
-83
lines changed

4 files changed

+21
-83
lines changed

eval_protocol/integrations/openai_rft/adapter.py renamed to eval_protocol/integrations/openai_rft.py

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,24 +16,30 @@ def build_python_grader_from_evaluation_test(test_fn) -> dict:
1616
Return an OpenAI Python grader spec from an Eval Protocol-style evaluation function.
1717
1818
Assumptions:
19-
- `test_fn` is the *core* evaluation function (not the @evaluation_test wrapper),
20-
or an @evaluation_test-decorated function that carries _origin_func.
21-
It should have a signature like:
19+
- `test_fn` is either:
20+
* the core evaluation function, or
21+
* an @evaluation_test-decorated function that carries `_origin_func`.
22+
Its effective signature looks like:
2223
2324
def my_eval(row, **kwargs) -> EvaluateResult | float | EvaluationRow
2425
25-
- The function only relies on attributes that we provide on `EvaluationRowLike`
26-
(you can extend that class as needed).
26+
- The function treats `row` as an `EvaluationRow` and only relies on attributes
27+
we provide in the duck-typed stand-in:
28+
* row.ground_truth
29+
* row.messages
30+
* row.item (raw item dict)
31+
* row.sample (raw sample dict)
2732
28-
- We map OpenAI's (sample, item) to a duck‑typed `row`:
29-
- item["reference_answer"] -> row.ground_truth
30-
- sample["output_text"] -> appended as an assistant message
31-
- raw dicts available as row.item / row.sample
33+
- We map OpenAI's (sample, item) into that duck-typed `EvaluationRow` as follows:
34+
* item["reference_answer"] -> row.ground_truth
35+
* item["messages"] (if present) -> row.messages (normalized to Message-like objects)
36+
* sample["output_text"] -> appended as the last assistant message in row.messages
37+
* the original dicts are also available via row.item / row.sample
3238
3339
- The function returns either:
34-
- a numeric score, or
35-
- an object/dict with a `score` field, or
36-
- an EvaluationRow/EvaluateResult-like object with `.evaluation_result.score`.
40+
* a numeric score, or
41+
* an object/dict with a `score` field, or
42+
* an EvaluationRow/EvaluateResult-like object with `.evaluation_result.score`.
3743
"""
3844

3945
# If the user passed an @evaluation_test wrapper, try to recover the original function

eval_protocol/integrations/openai_rft/README.md

Lines changed: 0 additions & 68 deletions
This file was deleted.

eval_protocol/integrations/openai_rft/example_rapidfuzz.py renamed to examples/openai_rft/example_rapidfuzz.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
from typing import Any
1212

13-
from eval_protocol.integrations.openai_rft.adapter import build_python_grader_from_evaluation_test
13+
from eval_protocol.integrations.openai_rft import build_python_grader_from_evaluation_test
1414
from eval_protocol.models import EvaluateResult, EvaluationRow, Message
1515
from eval_protocol.pytest import evaluation_test
1616
from eval_protocol.pytest.default_no_op_rollout_processor import NoOpRolloutProcessor

eval_protocol/integrations/openai_rft/test_openai_grader.py renamed to examples/openai_rft/test_openai_grader.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
import os
22
import requests
33

4-
from eval_protocol.integrations.openai_rft.adapter import build_python_grader_from_evaluation_test
5-
from eval_protocol.integrations.openai_rft.example_rapidfuzz import rapidfuzz_eval
4+
from eval_protocol.integrations.openai_rft import build_python_grader_from_evaluation_test
5+
from examples.openai_rft.example_rapidfuzz import rapidfuzz_eval
66

77

88
api_key = os.environ["OPENAI_API_KEY"]

0 commit comments

Comments
 (0)