Add original example idx in input_metadata by mayinghan · Pull Request #346 · eval-protocol/python-sdk

mayinghan · 2025-11-25T06:16:35Z

name: Pull Request
about: Propose changes to the codebase
title: "Brief description of changes"
labels: ''
assignees: ''

Description

Please include a summary of the change and which issue is fixed or feature is implemented. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)
Implements # (issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update
Refactoring/Code cleanup
Build/CI/CD related changes
Other (please describe):

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

Test A
Test B

Test Configuration:

Firmware version:
Hardware:
Toolchain:
SDK:

Checklist:

My code follows the style guidelines of this project (ran black ., isort ., flake8 .)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules
I have checked my code and corrected any misspellings

Screenshots (if applicable)

If applicable, add screenshots to help showcase your changes.

Additional context

Add any other context about the PR here.

Note

Adds data_loader_row_idx to row metadata, copies dataset_info from remote traces, and adds a test ensuring stable row IDs for identical-content rows.

Data Loading:
- Add data_loader_row_idx to row.input_metadata.dataset_info in eval_protocol/data_loader/models.py.
- Iterate with enumerate in _apply_metadata to attach per-row index.
Tracing:
- Propagate input_metadata.dataset_info from remote rows in eval_protocol/pytest/tracing_utils.py during rollout update.
Tests:
- Add tests/data_loader/test_data_loader_stable_row_id.py to verify stable/unique row_id across repeated generation with identical content.

^{Written by Cursor Bugbot for commit 93ebb15. This will update automatically on new commits. Configure here.}

cursor · 2025-11-25T06:19:49Z

eval_protocol/data_loader/models.py

            # Apply row counts
            row.input_metadata.dataset_info["data_loader_num_rows"] = original_count
            row.input_metadata.dataset_info["data_loader_num_rows_after_preprocessing"] = processed_count
+            row.input_metadata.dataset_info["data_loader_row_idx"] = idx


Bug: Row index added after preprocessing instead of before

The data_loader_row_idx is enumerated from rows after preprocessing, but the PR aims to add the original example index. When preprocess_fn filters rows, the indices get renumbered (e.g., original rows 0, 2, 4 become indices 0, 1, 2), losing track of the original positions. To capture original indices, enumeration needs to happen before preprocessing in _process_variant and the index preserved through the preprocessing step.

cursor · 2025-11-25T06:42:45Z

eval_protocol/pytest/tracing_utils.py

        row.messages = remote_row.messages
        row.tools = remote_row.tools
        row.input_metadata.session_data = remote_row.input_metadata.session_data
+        row.input_metadata.dataset_info = remote_row.input_metadata.dataset_info


Bug: Original row index lost when copying remote dataset info

The complete overwriting of row.input_metadata.dataset_info with remote_row.input_metadata.dataset_info causes the original row's data_loader_row_idx to be lost. The remote row typically has data_loader_row_idx=0 after filter_longest_conversation preprocessing returns a single-element list, which overwrites the original index that tracked the row's position in the source dataset. This also loses any other original dataset metadata fields.

dphuang2 · 2025-11-25T06:56:14Z

can you make sure to add tests for whatever behavior you expect

dphuang2

awesome, thanks! I think there are existing tests for the non-duplicated case so if tests pass then LGTM

eval_protocol/pytest/tracing_utils.py

add

70b70cb

mayinghan requested a review from benjibc November 25, 2025 06:17

mayinghan marked this pull request as ready for review November 25, 2025 06:17

cursor bot reviewed Nov 25, 2025

View reviewed changes

respect original line number from the source dataloader

6769bf5

cursor bot reviewed Nov 25, 2025

View reviewed changes

add ut

916e898

mayinghan added 2 commits November 24, 2025 22:58

add

3125034

add

ecb05e0

dphuang2 approved these changes Nov 25, 2025

View reviewed changes

mayinghan commented Nov 25, 2025

View reviewed changes

eval_protocol/pytest/tracing_utils.py Show resolved Hide resolved

fix

93ebb15

mayinghan merged commit 361369e into main Nov 25, 2025
9 checks passed

mayinghan deleted the fix-stable-row-id-with-duplicate-content branch November 25, 2025 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add original example idx in input_metadata #346

Add original example idx in input_metadata #346
mayinghan merged 6 commits intomainfrom
fix-stable-row-id-with-duplicate-content

mayinghan commented Nov 25, 2025 •

edited by cursor bot

Loading

Uh oh!

cursor bot Nov 25, 2025

Uh oh!

cursor bot Nov 25, 2025

Uh oh!

dphuang2 commented Nov 25, 2025

Uh oh!

dphuang2 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mayinghan commented Nov 25, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How Has This Been Tested?

Checklist:

Screenshots (if applicable)

Additional context

Uh oh!

cursor bot Nov 25, 2025

Choose a reason for hiding this comment

Bug: Row index added after preprocessing instead of before

Uh oh!

cursor bot Nov 25, 2025

Choose a reason for hiding this comment

Bug: Original row index lost when copying remote dataset info

Uh oh!

dphuang2 commented Nov 25, 2025

Uh oh!

dphuang2 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mayinghan commented Nov 25, 2025 •

edited by cursor bot

Loading