Skip to content

Latest commit

 

History

History
126 lines (90 loc) · 3.98 KB

File metadata and controls

126 lines (90 loc) · 3.98 KB

LiveResearchBench Dataset

Overview

The dataset is available on HuggingFace: Salesforce/LiveResearchBench.

Dataset Structure

LiveResearchBench contains 100 benchmark questions with checklists for evaluating reports generated by deep research agents across different criteria:

  • Subsets:

    • question_with_checklist: Full dataset with questions and per-question checklists
    • question_only: Questions without checklists

    Remarks: To avoid contanimation and overfitting to the benchmark, the HuggingFace version contains 80 questions. If you need access to the remaining 20 questions, please contact us at 📧 deep.research.bench@gmail.com

Loading the Dataset

Default: Static Mode (No Placeholders)

The default static mode loads questions and checklists with dates already filled in (e.g., 2025 instead of {{current_year}}):

from liveresearchbench.common.io_utils import load_liveresearchbench_dataset

# Load static version 
benchmark_data = load_liveresearchbench_dataset(use_realtime=False)

Example:

  • Question: "What is the size, growth rate, and segmentation of the U.S. electric vehicle market in 2025?"

Realtime Mode

For dynamic evaluation with current dates, use realtime mode:

# Load realtime version (replaces {{current_year}} etc.)
benchmark_data = load_liveresearchbench_dataset(use_realtime=True)

The following placeholders will be replaced by the current date:

  • {{current_year}} → 2025 (current year)
  • {{last_year}} → 2024 (previous year)
  • {{current_date}} or {{date}} → Nov 12, 2025 (formatted date)

Example:

  • Question: "What is the size, growth rate, and segmentation of the U.S. electric vehicle market in 2025?" (automatically updated each year)

Accessing Questions and Checklists

from liveresearchbench.common.io_utils import (
    load_liveresearchbench_dataset,
    get_question_for_qid,
    get_checklists_for_qid
)

# Load dataset
benchmark_data = load_liveresearchbench_dataset()

# Get question for a specific query ID
qid = "market6VWmPyxptfK47civ"
question = get_question_for_qid(benchmark_data, qid)

# Get checklist items for a specific query ID
checklists = get_checklists_for_qid(benchmark_data, qid)
print(f"Found {len(checklists)} checklist items")

Dataset Fields

For each entry in the dataset:

{
    'qid': 'market6VWmPyxptfK47civ',  # Unique query identifier
    'question': 'What is the size, growth rate...',  # Research question
    'checklists': [  # List of checklist items for coverage evaluation
        'Does the report provide data for the U.S. electric vehicle market...',
        'Does the report discuss the size, growth rate...',
        # ... more items
    ]
}

Downloading for Offline Use

To cache the dataset locally:

from datasets import load_dataset
dataset = load_dataset("Salesforce/LiveResearchBench", "question_with_checklist", split="test")
print(f"Cached {len(dataset)} entries")

The dataset will be cached at: ~/.cache/huggingface/datasets/

Usage in Tests

The test script automatically loads the dataset:

# In tests/test_real_grading.py
benchmark_data = load_liveresearchbench_dataset(use_realtime=True)

# Questions are fetched per report
for report in reports:
    query_id = report['query_id']
    question = get_question_for_qid(benchmark_data, query_id)
    checklists = get_checklists_for_qid(benchmark_data, query_id)
    
    # Use for grading...

Citation

If you find this dataset helpful, please consider citing:

@article{sfr2025liveresearchbench,
      title={LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild}, 
      author={Jiayu Wang and Yifei Ming and Riya Dulepet and Qinglin Chen and Austin Xu and Zixuan Ke and Frederic Sala and Aws Albarghouthi and Caiming Xiong and Shafiq Joty},
  year={2025},
  url={https://arxiv.org/abs/2510.14240}
}