-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b0dea0b
commit 993514a
Showing
174 changed files
with
9,279 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# Evaluations | ||
|
||
| Dataset and Evaluation | Colab | | ||
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| RAG over ARAGOG dataset | <a href="https://colab.research.google.com/github/deepset-ai/haystack-evaluation/blob/main/evaluations/evaluation_aragog.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> | | ||
|
||
|
||
|
||
## ARAGOG | ||
|
||
This dataset is based on the paper [Advanced Retrieval Augmented Generation Output Grading (ARAGOG)](https://arxiv.org/pdf/2404.01037). | ||
It's a collection of papers from ArXiv covering topics around Transformers and Large Language Models, all in PDF format. | ||
|
||
The dataset contains: | ||
- 13 PDF papers | ||
- 107 questions and answers generated with the assistance of GPT-4, and validated/corrected by humans. | ||
|
||
It has human annotations for the following metrics: | ||
- [ContextRelevance](https://docs.haystack.deepset.ai/docs/contextrelevanceevaluator) | ||
- [Faithfulness](https://docs.haystack.deepset.ai/docs/faithfulnessevaluator) | ||
- [Semantic Answer Similarity](https://docs.haystack.deepset.ai/docs/sasevaluator) | ||
|
||
Check the [RAG over ARAGOG dataset notebook](aragog_evaluation.ipynb) for an example. | ||
|
||
|
||
--- | ||
|
||
## SQuAD dataset | ||
|
||
The SQuAD dataset is a collection of questions and answers from Wikipedia articles. | ||
This dataset is typically used for training and evaluating models for extractive question-answering tasks. | ||
|
||
The dataset contains: | ||
- 490 Wikipedia articles in text format | ||
- 98k questions whose answers are spans in the articles | ||
|
||
It contains human annotations suitable for the following metrics: | ||
- [Answer Exact Match](https://docs.haystack.deepset.ai/docs/answerexactmatchevaluator) | ||
- [DocumentMRR](https://docs.haystack.deepset.ai/docs/documentmrrevaluator) | ||
- [DocumentMAP](https://docs.haystack.deepset.ai/docs/documentmapevaluator) | ||
- [DocumentRecall](https://docs.haystack.deepset.ai/docs/documentrecallevaluator) | ||
- [Semantic Answer Similarity](https://docs.haystack.deepset.ai/docs/sasevaluator) | ||
|
||
|
||
Check the [RAG over SQuAD notebook](squad_rag_evaluation.ipynb) for an example. | ||
|
||
Check the [Extractive QA over SQuAD notebook](squad_extractive_qa_evaluation.ipynb) for an example. |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
113 changes: 113 additions & 0 deletions
113
evaluations/results/aragog_results/detailed_all-MiniLM-L6-v2__top_k:1__chunk_size:128.csv
Large diffs are not rendered by default.
Oops, something went wrong.
118 changes: 118 additions & 0 deletions
118
evaluations/results/aragog_results/detailed_all-MiniLM-L6-v2__top_k:1__chunk_size:256.csv
Large diffs are not rendered by default.
Oops, something went wrong.
108 changes: 108 additions & 0 deletions
108
evaluations/results/aragog_results/detailed_all-MiniLM-L6-v2__top_k:1__chunk_size:64.csv
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.