Datasets

1. ARAGOG

This dataset is based on the paper Advanced Retrieval Augmented Generation Output Grading (ARAGOG). It's a collection of papers from ArXiv covering topics around Transformers and Large Language Models, all in PDF format.

The dataset contains:

13 PDF papers.
107 questions and answers generated with the assistance of GPT-4, and validated/corrected by humans.

The following metrics can be used:

The SQuAD 1.1 dataset is a collection of questions and answers from Wikipedia articles, and it's typically used for training and evaluating models for extractive question-answering tasks. You can find more about this dataset on the paper SQuAD: 100,000+ Questions for Machine Comprehension of Text and on the official website: https://rajpurkar.github.io/SQuAD-explorer/

The dataset contains:

It contains human annotations suitable for the following metrics: