-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: explicitly tell ContextRelevanceEvaluator
that each statement should be scored
#7904
fix: explicitly tell ContextRelevanceEvaluator
that each statement should be scored
#7904
Conversation
Pull Request Test Coverage Report for Build 9600236879Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from a little doubt, it looks like a good improvement.
I would like Julian to take a look at it as well.
questions = ["Who created the Python language?"] | ||
contexts = [ | ||
[ | ||
"Python, created by Guido van Rossum in the late 1980s, is a high-level general-purpose programming language. Its design philosophy emphasizes code readability, and its language constructs aim to help programmers write clear, logical code for both small and large-scale software projects.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many statements can be extracted from this sentence? 2 or 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Java is a high...
context also should have 2 statements + the Scale is a..
gives 5 statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted this, so that it's more deterministic - the LLM can say it's 4 or 5
Pull Request Test Coverage Report for Build 9600451557Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 9600928697Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 9612276285Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
@julian-risch can you have look as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quite good to me already. Just remove the progress_bar=False
from the init in the tests please. Let's test the default init which is more likely to be used by users.
In addition please run the end to end test for evaluation. It contains the ContextRelevanceEvaluator: https://github.com/deepset-ai/haystack/blob/main/e2e/pipelines/test_evaluation_pipeline.py
Once that's done, I'll approve this PR. 🚀
Pull Request Test Coverage Report for Build 9664305794Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
Related Issues
ContextRelevanceEvaluator
: statements extraction can be made more robust #7803Proposed Changes:
How did you test it?
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.