This repository contains examples on how to use Haystack to build different RAG architectures and evaluate their performance over different datasets.
This dataset is based on the paper Advanced Retrieval Augmented Generation Output Grading (ARAGOG). It's a collection of papers from ArXiv covering topics around Transformers and Large Language Models, all in PDF format.
The dataset contains:
- 13 PDF papers
- 107 questions and answers generated with the assistance of GPT-4, and validated/corrected by humans.
It has human annotations for the following metrics:
Check the RAG over ARAGOG dataset notebook for an example.
The SQuAD dataset is a collection of questions and answers from Wikipedia articles. This dataset is typically used for training and evaluating models for extractive question-answering tasks.
The dataset contains:
- 490 Wikipedia articles in text format
- 98k questions whose answers are spans in the articles
It contains human annotations suitable for the following metrics:
Check the RAG over SQuAD notebook for an example.
Check the Extractive QA over SQuAD notebook for an example.