LiveResearchBench Dataset

Overview

The dataset is available on HuggingFace: Salesforce/LiveResearchBench.

Dataset Structure

LiveResearchBench contains 100 benchmark questions with checklists for evaluating reports generated by deep research agents across different criteria:

Subsets:
- question_with_checklist: Full dataset with questions and per-question checklists
- question_only: Questions without checklists
Remarks: To avoid contanimation and overfitting to the benchmark, the HuggingFace version contains 80 questions. If you need access to the remaining 20 questions, please contact us at 📧 deep.research.bench@gmail.com

Loading the Dataset

Default: Static Mode (No Placeholders)

The default static mode loads questions and checklists with dates already filled in (e.g., 2025 instead of {{current_year}}):

from liveresearchbench.common.io_utils import load_liveresearchbench_dataset

# Load static version 
benchmark_data = load_liveresearchbench_dataset(use_realtime=False)

Example:

Question: "What is the size, growth rate, and segmentation of the U.S. electric vehicle market in 2025?"

Realtime Mode

For dynamic evaluation with current dates, use realtime mode:

# Load realtime version (replaces {{current_year}} etc.)
benchmark_data = load_liveresearchbench_dataset(use_realtime=True)

The following placeholders will be replaced by the current date:

{{current_year}} → 2025 (current year)
{{last_year}} → 2024 (previous year)
{{current_date}} or {{date}} → Nov 12, 2025 (formatted date)

Example:

Question: "What is the size, growth rate, and segmentation of the U.S. electric vehicle market in 2025?" (automatically updated each year)

Accessing Questions and Checklists

from liveresearchbench.common.io_utils import (
    load_liveresearchbench_dataset,
    get_question_for_qid,
    get_checklists_for_qid
)

# Load dataset
benchmark_data = load_liveresearchbench_dataset()

# Get question for a specific query ID
qid = "market6VWmPyxptfK47civ"
question = get_question_for_qid(benchmark_data, qid)

# Get checklist items for a specific query ID
checklists = get_checklists_for_qid(benchmark_data, qid)
print(f"Found {len(checklists)} checklist items")

Dataset Fields

For each entry in the dataset:

{
    'qid': 'market6VWmPyxptfK47civ',  # Unique query identifier
    'question': 'What is the size, growth rate...',  # Research question
    'checklists': [  # List of checklist items for coverage evaluation
        'Does the report provide data for the U.S. electric vehicle market...',
        'Does the report discuss the size, growth rate...',
        # ... more items
    ]
}

Downloading for Offline Use

To cache the dataset locally:

from datasets import load_dataset
dataset = load_dataset("Salesforce/LiveResearchBench", "question_with_checklist", split="test")
print(f"Cached {len(dataset)} entries")

The dataset will be cached at: ~/.cache/huggingface/datasets/

Usage in Tests

The test script automatically loads the dataset:

# In tests/test_real_grading.py
benchmark_data = load_liveresearchbench_dataset(use_realtime=True)

# Questions are fetched per report
for report in reports:
    query_id = report['query_id']
    question = get_question_for_qid(benchmark_data, query_id)
    checklists = get_checklists_for_qid(benchmark_data, query_id)
    
    # Use for grading...

Citation

If you find this dataset helpful, please consider citing:

@article{sfr2025liveresearchbench,
      title={LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild}, 
      author={Jiayu Wang and Yifei Ming and Riya Dulepet and Qinglin Chen and Austin Xu and Zixuan Ke and Frederic Sala and Aws Albarghouthi and Caiming Xiong and Shafiq Joty},
  year={2025},
  url={https://arxiv.org/abs/2510.14240}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LiveResearchBench Dataset

Overview

Dataset Structure

Loading the Dataset

Default: Static Mode (No Placeholders)

Realtime Mode

Accessing Questions and Checklists

Dataset Fields

Downloading for Offline Use

Usage in Tests

Citation

FilesExpand file tree

DATASET.md

Latest commit

History

DATASET.md

File metadata and controls

LiveResearchBench Dataset

Overview

Dataset Structure

Loading the Dataset

Default: Static Mode (No Placeholders)

Realtime Mode

Accessing Questions and Checklists

Dataset Fields

Downloading for Offline Use

Usage in Tests

Citation