Skip to content

[BUG] KeyError: 'default' #1163

@2niuhe

Description

@2niuhe
(eval_venv) root@b2c98f779d6b:~/workshop# lighteval endpoint litellm  ./qwen3_nothink.yaml  'bigbench:tracking_shuffled_objects' --max-samples 5
[2026-01-31 17:38:00,695] [ WARNING]: --max_samples WAS SET. THESE NUMBERS ARE ONLY PARTIAL AND SHOULD NOT BE USED FOR COMPARISON UNLESS YOU KNOW WHAT YOU ARE DOING. (pipeline.py:132)
[2026-01-31 17:38:00,695] [    INFO]: --- INIT SEEDS --- (pipeline.py:254)
[2026-01-31 17:38:00,695] [    INFO]: --- LOADING TASKS --- (pipeline.py:211)
[2026-01-31 17:38:00,792] [ WARNING]: /root/workshop/eval_venv/lib/python3.12/site-packages/syllapy/data_loader.py:3: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
 (warnings.py:110)
[2026-01-31 17:38:00,968] [    INFO]: Loaded 646 task configs in 0.3 seconds (registry.py:379)
╭──────────────────────────────── Traceback (most recent call last) ─────────────────────────────────╮
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/main_endpoint.py:304 in litellm    │
│                                                                                                    │
│   301 │   │   remove_reasoning_tags=remove_reasoning_tags,                                         │
│   302 │   │   reasoning_tags=reasoning_tags,                                                       │
│   303 │   )                                                                                        │
│ ❱ 304 │   pipeline = Pipeline(                                                                     │
│   305 │   │   tasks=tasks,                                                                         │
│   306 │   │   pipeline_parameters=pipeline_params,                                                 │
│   307 │   │   evaluation_tracker=evaluation_tracker,                                               │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/pipeline.py:142 in __init__        │
│                                                                                                    │
│   139 │   │                                                                                        │
│   140 │   │   # We init tasks first to fail fast if one is badly defined                           │
│   141 │   │   self._init_random_seeds()                                                            │
│ ❱ 142 │   │   self._init_tasks_and_requests(tasks=tasks)                                           │
│   143 │   │                                                                                        │
│   144 │   │   self.model_config = model_config                                                     │
│   145 │   │   self.accelerator, self.parallel_context = self._init_parallelism_manager()           │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/pipeline.py:224 in                 │
│ _init_tasks_and_requests                                                                           │
│                                                                                                    │
│   221 │   │   self.tasks_dict: dict[str, LightevalTask] = self.registry.load_tasks()               │
│   222 │   │   LightevalTask.load_datasets(self.tasks_dict, self.pipeline_parameters.dataset_lo     │
│   223 │   │   self.documents_dict = {                                                              │
│ ❱ 224 │   │   │   task.full_name: task.get_docs(self.pipeline_parameters.max_samples) for _, t     │
│   225 │   │   }                                                                                    │
│   226 │   │                                                                                        │
│   227 │   │   self.sampling_docs = collections.defaultdict(list)                                   │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/tasks/lighteval_task.py:378 in     │
│ get_docs                                                                                           │
│                                                                                                    │
│   375 │   │   Raises:                                                                              │
│   376 │   │   │   ValueError: If no documents are available for evaluation.                        │
│   377 │   │   """
│ ❱ 378 │   │   eval_docs = self.eval_docs()                                                         │
│   379 │   │                                                                                        │
│   380 │   │   if len(eval_docs) == 0:                                                              │
│   381 │   │   │   raise ValueError(f"Task {self.name} has no documents to evaluate skipping.")     │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/tasks/lighteval_task.py:355 in     │
│ eval_docs                                                                                          │
│                                                                                                    │
│   352 │   │   │   list[Doc]: Evaluation documents.                                                 │
│   353 │   │   """                                                                                  │
│   354 │   │   if self._docs is None:                                                               │
│ ❱ 355 │   │   │   self._docs = self._get_docs_from_split(self.evaluation_split)                    │
│   356 │   │   │   if self.must_remove_duplicate_docs:                                              │
│   357 │   │   │   │   self._docs = self.remove_duplicate_docs(self._docs)                          │
│   358 │   │   return self._docs                                                                    │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/tasks/lighteval_task.py:298 in     │
│ _get_docs_from_split                                                                               │
│                                                                                                    │
│   295 │   │                                                                                        │
│   296 │   │   docs = []                                                                            │
│   297 │   │   for split in splits:                                                                 │
│ ❱ 298 │   │   │   for ix, item in enumerate(self.dataset[split]):                                  │
│   299 │   │   │   │   # Some tasks formatting is applied differently when the document is used     │
│   300 │   │   │   │   # vs when it's used for the actual prompt. That's why we store whether w     │
│   301 │   │   │   │   # doc for a fewshot sample (few_shots=True) or not, which then leads to      │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/datasets/dataset_dict.py:86 in __getitem__   │
│                                                                                                    │
│     83 │                                                                                           │
│     84 │   def __getitem__(self, k) -> Dataset:                                                    │
│     85 │   │   if isinstance(k, (str, NamedSplit)) or len(self) == 0:                              │
│ ❱   86 │   │   │   return super().__getitem__(k)                                                   │
│     87 │   │   else:                                                                               │
│     88 │   │   │   available_suggested_splits = [                                                  │
│     89 │   │   │   │   split for split in (Split.TRAIN, Split.TEST, Split.VALIDATION) if split     │
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'default'

Expected behavior

A clear and concise description of what you expected to happen.

Version info

Please provide your operating system, lighteval version or commit if you installed from main, and pip/conda environment if your problem concerns dependencies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions