Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
datasets		datasets
evaluations		evaluations
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

haystack-evaluation

This repository contains examples on how to use Haystack to build different RAG architectures and evaluate their performance over different datasets.

Evaluations

ARAGOG

This dataset is based on the paper Advanced Retrieval Augmented Generation Output Grading (ARAGOG). It's a collection of papers from ArXiv covering topics around Transformers and Large Language Models, all in PDF format.

The dataset contains:

13 PDF papers
107 questions and answers generated with the assistance of GPT-4, and validated/corrected by humans.

It has human annotations for the following metrics:

Check the RAG over ARAGOG dataset notebook for an example.

SQuAD dataset

The SQuAD dataset is a collection of questions and answers from Wikipedia articles. This dataset is typically used for training and evaluating models for extractive question-answering tasks.

The dataset contains:

490 Wikipedia articles in text format
98k questions whose answers are spans in the articles

It contains human annotations suitable for the following metrics:

Check the RAG over SQuAD notebook for an example.

Check the Extractive QA over SQuAD notebook for an example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

haystack-evaluation

Evaluations

ARAGOG

SQuAD dataset

About

Releases

Packages

Contributors 2

Languages

License

deepset-ai/haystack-evaluation

Folders and files

Latest commit

History

Repository files navigation

haystack-evaluation

Evaluations

ARAGOG

SQuAD dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages