Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related to #250.
We introduce
DeepEvalEvaluator
, a component that uses the DeepEval LLM evaluation framework to calculate evaluation metrics for RAG pipelines (among others). Refer deepset-ai/haystack#6784 for an overview of the API design.This PR introduces the following user-facing classes:
DeepEvalMetric
- A enumeration that lists the supported DeepEval metrics. Currently, only those metrics that are related to RAG pipelines are supported.DeepEvalEvaluator
- Th pipeline component interfaces with the evaluation framework. It accepts a single metric and its optional parameters. The inputs to the pipeline are dynamically configured depending on the metric. This is done with help of a metric descriptor table that contains metadata concerning input/output conversion formats, expected inputs/outputs, etc.The output of the component is a nested list of metric results. Each input can have one or more results, depending on the metric. Each result is a dictionary containing the following keys and values:
name
- The name of the metric.score
- The score of the metric.explanation
- An optional explanation of the score.