Describe the bug
The lextreme benchmark failed with KeyError or configuration errors because the evaluation_splits defined in the configuration (["validation", "test"]) did not match the actual available splits in the dataset for several subsets.
To Reproduce
task = "lextreme:multi_eurlex_level_1|5"
pipeline = Pipeline(
tasks=task,
pipeline_parameters=pipeline_params,
evaluation_tracker=evaluation_tracker,
model_config=model_config,
)
pipeline.evaluate()
pipeline.save_and_push_results()
pipeline.show_results()
141 self._init_random_seeds()
--> 142 self._init_tasks_and_requests(tasks=tasks)
144 self.model_config = model_config
145 self.accelerator, self.parallel_context = self._init_parallelism_manager()
...
88 available_suggested_splits = [
89 split for split in (Split.TRAIN, Split.TEST, Split.VALIDATION) if split in self
90 ]
KeyError: 'validation'
Expected behavior
The configuration should only reference splits that are actually available on the Hugging Face Hub for each subset.
Version info
- OS: mac
- Lighteval version: main (local development)
Describe the bug
The
lextremebenchmark failed withKeyErroror configuration errors because theevaluation_splitsdefined in the configuration (["validation", "test"]) did not match the actual available splits in the dataset for several subsets.To Reproduce
Expected behavior
The configuration should only reference splits that are actually available on the Hugging Face Hub for each subset.
Version info