-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into sentence-window-retrieval
- Loading branch information
Showing
119 changed files
with
3,853 additions
and
258 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -162,3 +162,4 @@ cython_debug/ | |
# MacOS | ||
.DS_Store | ||
*/.DS_Store | ||
**/.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,12 @@ | ||
# haystack-evaluation | ||
|
||
This repository contains examples on how to use Haystack to build different RAG architectures and evaluate their performance over different datasets. | ||
This repository contains examples on how to use Haystack to evaluate systems build with Haystack for different tasks | ||
and datasets. | ||
|
||
This repository is structured as: | ||
|
||
- [Evaluations](evaluations/README.md) | ||
|
||
- [Techniques/Architectures](evaluations/architectures/README.md) | ||
|
||
- [RAG Techniques/Architectures](evaluations/architectures/README.md) | ||
- [Datasets](datasets/README.md) |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,9 @@ | ||
# Evaluations | ||
|
||
Name | Dataset | Evaluation Metrics | Colab | | ||
--------------------------------------------------------------------------|---------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
[RAG Evaluation](evaluation_aragog.py) | ARAGOG | [ContextRelevance](https://docs.haystack.deepset.ai/docs/contextrelevanceevaluator) , [Faithfulness](https://docs.haystack.deepset.ai/docs/faithfulnessevaluator), [Semantic Answer Similarity](https://docs.haystack.deepset.ai/docs/sasevaluator) | <a href="https://colab.research.google.com/github/deepset-ai/haystack-evaluation/blob/main/evaluations/evaluation_aragog.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> | | ||
[RAG Evaluation](evaluation_squad_rag.py) | SQuAD | [ContextRelevance](https://docs.haystack.deepset.ai/docs/contextrelevanceevaluator) , [Faithfulness](https://docs.haystack.deepset.ai/docs/faithfulnessevaluator), [Semantic Answer Similarity](https://docs.haystack.deepset.ai/docs/sasevaluator) | ToDo | | ||
[Extractive QA Evaluation](evaluation_squad_extractive_qa.py) | SQuAD | [Answer Exact Match](https://docs.haystack.deepset.ai/docs/answerexactmatchevaluator), [DocumentMRR](https://docs.haystack.deepset.ai/docs/documentmrrevaluator), [DocumentMAP](https://docs.haystack.deepset.ai/docs/documentmapevaluator), [DocumentRecall](https://docs.haystack.deepset.ai/docs/documentrecallevaluator), [Semantic Answer Similarity](https://docs.haystack.deepset.ai/docs/sasevaluator) | ToDo | | ||
Here we provide full examples on how to use Haystack to evaluate systems build also with Haystack for different tasks and datasets. | ||
|
||
Name | Dataset | Evaluation Metrics | Colab | | ||
----------------------------------------------------------------------------------|---------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
[RAG with parameter search](evaluation_aragog.py) | ARAGOG | [ContextRelevance](https://docs.haystack.deepset.ai/docs/contextrelevanceevaluator) , [Faithfulness](https://docs.haystack.deepset.ai/docs/faithfulnessevaluator), [Semantic Answer Similarity](https://docs.haystack.deepset.ai/docs/sasevaluator) | <a href="https://colab.research.google.com/github/deepset-ai/haystack-evaluation/blob/main/evaluations/evaluation_aragog.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> | | ||
[Baseline RAG vs HyDE using Harness](evaluation_aragog_harness.py) | ARAGOG | [ContextRelevance](https://docs.haystack.deepset.ai/docs/contextrelevanceevaluator) , [Faithfulness](https://docs.haystack.deepset.ai/docs/faithfulnessevaluator), [Semantic Answer Similarity](https://docs.haystack.deepset.ai/docs/sasevaluator) | - | | ||
[Extractive QA with parameter search](evaluation_squad_extractive_qa.py) | SQuAD | [Answer Exact Match](https://docs.haystack.deepset.ai/docs/answerexactmatchevaluator), [DocumentMRR](https://docs.haystack.deepset.ai/docs/documentmrrevaluator), [DocumentMAP](https://docs.haystack.deepset.ai/docs/documentmapevaluator), [DocumentRecall](https://docs.haystack.deepset.ai/docs/documentrecallevaluator), [Semantic Answer Similarity](https://docs.haystack.deepset.ai/docs/sasevaluator) | - | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.