-
Notifications
You must be signed in to change notification settings - Fork 16
Add original example idx in input_metadata #346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -171,6 +171,7 @@ def update_row_with_remote_trace( | |
| row.messages = remote_row.messages | ||
| row.tools = remote_row.tools | ||
| row.input_metadata.session_data = remote_row.input_metadata.session_data | ||
| row.input_metadata.dataset_info = remote_row.input_metadata.dataset_info | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: Original row index lost when copying remote dataset infoThe complete overwriting of
xzrderek marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| row.execution_metadata = remote_row.execution_metadata | ||
| return None | ||
| else: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| from eval_protocol.data_loader.dynamic_data_loader import DynamicDataLoader | ||
| from eval_protocol.models import EvaluationRow, Message, EvaluateResult | ||
| from eval_protocol.pytest import evaluation_test | ||
| from typing import List | ||
|
|
||
| def generator() -> list[EvaluationRow]: | ||
| return [EvaluationRow(messages=[Message(role="user", content="What is 2 + 2?")]) for _ in range(2)] | ||
|
|
||
| @evaluation_test( | ||
| data_loaders=DynamicDataLoader( | ||
| generators=[generator], | ||
| ), | ||
| mode="all", | ||
| ) | ||
| def test_data_loader_stable_row_id_with_same_content(rows: List[EvaluationRow]) -> List[EvaluationRow]: | ||
| """Test that the row id is stable even when the data loader is called multiple times.""" | ||
| row_ids = set() | ||
| for row in rows: | ||
| row_ids.add(row.input_metadata.row_id) | ||
| row.evaluation_result = EvaluateResult(score=0.0, reason="Dummy evaluation result") | ||
| assert len(row_ids) == 2 | ||
| return rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Row index added after preprocessing instead of before
The
data_loader_row_idxis enumerated from rows after preprocessing, but the PR aims to add the original example index. Whenpreprocess_fnfilters rows, the indices get renumbered (e.g., original rows 0, 2, 4 become indices 0, 1, 2), losing track of the original positions. To capture original indices, enumeration needs to happen before preprocessing in_process_variantand the index preserved through the preprocessing step.