Skip to content

Latest commit

 

History

History
44 lines (32 loc) · 1.19 KB

File metadata and controls

44 lines (32 loc) · 1.19 KB

HotpotQA Dataset

Official Sources:

What you get:

  • 113k Wikipedia-based question-answer pairs
  • Multi-hop reasoning questions
  • Supporting facts at sentence level
  • CC BY-SA 4.0 License

Wikipedia Dataset

For RAG Research:

Commands to download:

For HotpotQA

git clone https://github.com/hotpotqa/hotpot.git cd hotpot

Follow their download script

Or use Python with datasets library:

pip install datasets

from datasets import load_dataset

HotpotQA

hotpot = load_dataset("hotpotqa/hotpot_qa", "distractor")

Wikipedia RAG mini

wiki_rag = load_dataset("rag-datasets/rag-mini-wikipedia")

Full Wikipedia

wikipedia = load_dataset("wikipedia", "20220301.en")