Skip to content

Add data loader abstraction to evaluation_test#189

Draft
benjibc wants to merge 1 commit intomainfrom
codex/propose-new-data-loader-interface
Draft

Add data loader abstraction to evaluation_test#189
benjibc wants to merge 1 commit intomainfrom
codex/propose-new-data-loader-interface

Conversation

@benjibc
Copy link
Copy Markdown
Contributor

@benjibc benjibc commented Sep 18, 2025

Summary

  • introduce a data loader framework for evaluation tests, including an inline loader and a Langfuse adapter wrapper
  • integrate the evaluation_test decorator with loader variants, metadata enrichment, and backwards-compatible parameterization
  • export the new loader types and cover them with pytest examples

Current problems

  • data going forward probably shouldn't be parameterized, I don't think there is a lot of demand for it
  • this will simplify data loader implementation quite a bit

Testing

  • pre-commit run --all-files
  • pytest tests/pytest/test_data_loader.py

https://chatgpt.com/codex/tasks/task_e_68cb974079d883338a38bb3b9c8e6fcb

Copy link
Copy Markdown
Collaborator

@dphuang2 dphuang2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move all the data loader logic out into its own module, evaluation_test is already so big

Copy link
Copy Markdown
Collaborator

@dphuang2 dphuang2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just accept a lambda that produces Sequence[list[EvaluationRow]]

completion_params=[{"model": "no-op"}],
rollout_processor=NoOpRolloutProcessor(),
max_dataset_rows=1,
preprocess_fn=_preprocess_rows,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't preprocessing happen in dataloader

Comment on lines +273 to +279
combinations: list[CombinationTuple] = []
if has_data_loader_variants:
pytest_parametrize_args = _build_data_loader_parametrize_args(
loader_variants,
completion_params,
evaluation_test_kwargs,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be its own conditional branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants