Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement UpTrainEvaluator #272

Merged
merged 5 commits into from
Jan 26, 2024

Conversation

shadeMe
Copy link
Contributor

@shadeMe shadeMe commented Jan 25, 2024

Related to #248.

We introduce UpTrainEvaluator, a component that uses the UpTrain LLM evaluation framework to calculate evaluation metrics for RAG pipelines (among others). Refer deepset-ai/haystack#6784 for an overview of the API design.

This PR introduces the following user-facing classes:

  • UpTrainMetric - A enumeration that lists the supported UpTrain metrics.
  • UpTrainEvaluator - The pipeline component interfaces with the evaluation framework. It accepts a single metric and its optional parameters. It also provides extra optional parameters to configure the API client. The inputs to the pipeline are dynamically configured depending on the metric. This is done with help of a metric descriptor table that contains metadata concerning input/output conversion formats, expected inputs/outputs, etc.

The output of the component is a nested list of metric results. Each input can have one or more results, depending on the metric. Each result is a dictionary containing the following keys and values:

  • name - The name of the metric.
  • score - The score of the metric.
  • explanation - An optional explanation of the score.

@shadeMe shadeMe added new integration Discuss the creation of a new integration in Core integration: uptrain labels Jan 25, 2024
@CLAassistant
Copy link

CLAassistant commented Jan 25, 2024

CLA assistant check
All committers have signed the CLA.

@shadeMe shadeMe linked an issue Jan 25, 2024 that may be closed by this pull request
10 tasks
@shadeMe shadeMe marked this pull request as ready for review January 25, 2024 16:25
@shadeMe shadeMe requested a review from a team as a code owner January 25, 2024 16:25
@shadeMe shadeMe requested review from vblagoje and julian-risch and removed request for a team and vblagoje January 25, 2024 16:25
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is in really great shape! I have commented what we also briefly talked about.
Could you please also add an example of how the new component can be used in a pipeline? Here is an example of another integration's example: https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/chroma/example/example.py

integrations/uptrain/src/uptrain_haystack/evaluator.py Outdated Show resolved Hide resolved
integrations/uptrain/tests/test_evaluator.py Show resolved Hide resolved
integrations/uptrain/src/uptrain_haystack/evaluator.py Outdated Show resolved Hide resolved
integrations/uptrain/src/uptrain_haystack/evaluator.py Outdated Show resolved Hide resolved
integrations/uptrain/src/uptrain_haystack/evaluator.py Outdated Show resolved Hide resolved
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add an entry about UpTrain also to the inventory in the readme as part of this PR. https://github.com/deepset-ai/haystack-core-integrations/tree/main?tab=readme-ov-file#inventory

Update project structure to use the `haystack_integrations` namespace
README.md Outdated Show resolved Hide resolved
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍 The example is really helpful. The topic of list processing we can postpone and for the output format, let's see whether there are any use cases that would benefit from having separate edges instead of one dict. We talked about the pipeline visualization that unfortunately hides the contents of the dict from the user in its current implementation. Let's get this merged fast and collect feedback from users! Thanks for the fruitful discussions and great job! 🙂

@julian-risch
Copy link
Member

And let's see whether somebody from UpTrain can help with the integration test of the response matching metric that fails with a 500 Internal Server Error

@shadeMe shadeMe merged commit 4ddcd5e into deepset-ai:main Jan 26, 2024
9 checks passed
@shadeMe shadeMe deleted the feat/uptrain-evaluator branch January 26, 2024 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new integration Discuss the creation of a new integration in Core topic:CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants