The dataset is available on HuggingFace: Salesforce/LiveResearchBench.
LiveResearchBench contains 100 benchmark questions with checklists for evaluating reports generated by deep research agents across different criteria:
-
Subsets:
question_with_checklist: Full dataset with questions and per-question checklistsquestion_only: Questions without checklists
Remarks: To avoid contanimation and overfitting to the benchmark, the HuggingFace version contains 80 questions. If you need access to the remaining 20 questions, please contact us at 📧 deep.research.bench@gmail.com
The default static mode loads questions and checklists with dates already filled in (e.g., 2025 instead of {{current_year}}):
from liveresearchbench.common.io_utils import load_liveresearchbench_dataset
# Load static version
benchmark_data = load_liveresearchbench_dataset(use_realtime=False)Example:
- Question: "What is the size, growth rate, and segmentation of the U.S. electric vehicle market in 2025?"
For dynamic evaluation with current dates, use realtime mode:
# Load realtime version (replaces {{current_year}} etc.)
benchmark_data = load_liveresearchbench_dataset(use_realtime=True)The following placeholders will be replaced by the current date:
{{current_year}}→ 2025 (current year){{last_year}}→ 2024 (previous year){{current_date}} or {{date}}→ Nov 12, 2025 (formatted date)
Example:
- Question: "What is the size, growth rate, and segmentation of the U.S. electric vehicle market in 2025?" (automatically updated each year)
from liveresearchbench.common.io_utils import (
load_liveresearchbench_dataset,
get_question_for_qid,
get_checklists_for_qid
)
# Load dataset
benchmark_data = load_liveresearchbench_dataset()
# Get question for a specific query ID
qid = "market6VWmPyxptfK47civ"
question = get_question_for_qid(benchmark_data, qid)
# Get checklist items for a specific query ID
checklists = get_checklists_for_qid(benchmark_data, qid)
print(f"Found {len(checklists)} checklist items")For each entry in the dataset:
{
'qid': 'market6VWmPyxptfK47civ', # Unique query identifier
'question': 'What is the size, growth rate...', # Research question
'checklists': [ # List of checklist items for coverage evaluation
'Does the report provide data for the U.S. electric vehicle market...',
'Does the report discuss the size, growth rate...',
# ... more items
]
}To cache the dataset locally:
from datasets import load_dataset
dataset = load_dataset("Salesforce/LiveResearchBench", "question_with_checklist", split="test")
print(f"Cached {len(dataset)} entries")The dataset will be cached at: ~/.cache/huggingface/datasets/
The test script automatically loads the dataset:
# In tests/test_real_grading.py
benchmark_data = load_liveresearchbench_dataset(use_realtime=True)
# Questions are fetched per report
for report in reports:
query_id = report['query_id']
question = get_question_for_qid(benchmark_data, query_id)
checklists = get_checklists_for_qid(benchmark_data, query_id)
# Use for grading...If you find this dataset helpful, please consider citing:
@article{sfr2025liveresearchbench,
title={LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild},
author={Jiayu Wang and Yifei Ming and Riya Dulepet and Qinglin Chen and Austin Xu and Zixuan Ke and Frederic Sala and Aws Albarghouthi and Caiming Xiong and Shafiq Joty},
year={2025},
url={https://arxiv.org/abs/2510.14240}
}