Skip to content

v1 proposal#187

Closed
dphuang2 wants to merge 1 commit intomainfrom
eval-protocol-v2-interface
Closed

v1 proposal#187
dphuang2 wants to merge 1 commit intomainfrom
eval-protocol-v2-interface

Conversation

@dphuang2
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Collaborator Author

@dphuang2 dphuang2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly AI generated but will preserve invocation, experiment, run, row, and rollout ids somewere in this flow. But this is a high-level proposal to use pytest fixtures to parametrize the evals while still writing standalone evaluators as we already do.

Copy link
Copy Markdown
Contributor

@benjibc benjibc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

return request.param

# Pointwise fixture - parametrized across BOTH completion params AND dataset rows
@pytest.fixture(params=range(len(MATH_DATASET)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this line, this would come back as a list of int right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this should probably be a list of row ids

@dphuang2 dphuang2 changed the title v2 proposal v1 proposal Sep 17, 2025
@dphuang2 dphuang2 closed this Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants