Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: explicitly tell ContextRelevanceEvaluator that each statement should be scored #7904

Merged
merged 6 commits into from
Jun 25, 2024

Conversation

davidsbatista
Copy link
Contributor

Related Issues

Proposed Changes:

  • added a few more examples containing multiple non-relevant statements
  • added one more line to the prompt mentioning that each statement should be scored

How did you test it?

  • run unit tests and live test, also added one more live test

Checklist

@davidsbatista davidsbatista requested a review from a team as a code owner June 20, 2024 15:28
@davidsbatista davidsbatista requested review from vblagoje and julian-risch and removed request for a team June 20, 2024 15:28
@davidsbatista davidsbatista requested review from anakin87 and removed request for vblagoje June 20, 2024 15:29
@davidsbatista davidsbatista requested a review from a team as a code owner June 20, 2024 15:30
@davidsbatista davidsbatista requested review from dfokina and removed request for a team June 20, 2024 15:30
@coveralls
Copy link
Collaborator

coveralls commented Jun 20, 2024

Pull Request Test Coverage Report for Build 9600236879

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 90.171%

Totals Coverage Status
Change from base Build 9596818530: 0.0%
Covered Lines: 7018
Relevant Lines: 7783

💛 - Coveralls

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from a little doubt, it looks like a good improvement.

I would like Julian to take a look at it as well.

questions = ["Who created the Python language?"]
contexts = [
[
"Python, created by Guido van Rossum in the late 1980s, is a high-level general-purpose programming language. Its design philosophy emphasizes code readability, and its language constructs aim to help programmers write clear, logical code for both small and large-scale software projects.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many statements can be extracted from this sentence? 2 or 3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Java is a high... context also should have 2 statements + the Scale is a.. gives 5 statements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted this, so that it's more deterministic - the LLM can say it's 4 or 5

@coveralls
Copy link
Collaborator

coveralls commented Jun 20, 2024

Pull Request Test Coverage Report for Build 9600451557

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 90.171%

Totals Coverage Status
Change from base Build 9596818530: 0.0%
Covered Lines: 7018
Relevant Lines: 7783

💛 - Coveralls

@coveralls
Copy link
Collaborator

coveralls commented Jun 20, 2024

Pull Request Test Coverage Report for Build 9600928697

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 90.171%

Totals Coverage Status
Change from base Build 9596818530: 0.0%
Covered Lines: 7018
Relevant Lines: 7783

💛 - Coveralls

@github-actions github-actions bot added the type:documentation Improvements on the docs label Jun 21, 2024
@coveralls
Copy link
Collaborator

coveralls commented Jun 21, 2024

Pull Request Test Coverage Report for Build 9612276285

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 73 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-0.2%) to 89.977%

Files with Coverage Reduction New Missed Lines %
document_stores/in_memory/document_store.py 1 97.87%
core/component/component.py 2 98.28%
utils/hf.py 20 83.89%
components/validators/json_schema.py 50 0.0%
Totals Coverage Status
Change from base Build 9596818530: -0.2%
Covered Lines: 6715
Relevant Lines: 7463

💛 - Coveralls

@davidsbatista
Copy link
Contributor Author

@julian-risch can you have look as well?

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good to me already. Just remove the progress_bar=False from the init in the tests please. Let's test the default init which is more likely to be used by users.
In addition please run the end to end test for evaluation. It contains the ContextRelevanceEvaluator: https://github.com/deepset-ai/haystack/blob/main/e2e/pipelines/test_evaluation_pipeline.py
Once that's done, I'll approve this PR. 🚀

@coveralls
Copy link
Collaborator

coveralls commented Jun 25, 2024

Pull Request Test Coverage Report for Build 9664305794

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 97 unchanged lines in 8 files lost coverage.
  • Overall coverage decreased (-0.2%) to 89.968%

Files with Coverage Reduction New Missed Lines %
components/evaluators/document_map.py 1 96.15%
components/evaluators/document_mrr.py 1 95.45%
core/component/component.py 2 98.28%
components/converters/pypdf.py 6 90.0%
components/evaluators/sas_evaluator.py 8 63.33%
document_stores/in_memory/document_store.py 9 97.02%
utils/hf.py 20 83.89%
components/validators/json_schema.py 50 0.0%
Totals Coverage Status
Change from base Build 9596818530: -0.2%
Covered Lines: 6717
Relevant Lines: 7466

💛 - Coveralls

@julian-risch julian-risch self-requested a review June 25, 2024 14:33
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

@davidsbatista davidsbatista merged commit 8b9eddc into main Jun 25, 2024
17 checks passed
@davidsbatista davidsbatista deleted the CntextRelevancePrompt-evaluate-all-statements branch June 25, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ContextRelevanceEvaluator: statements extraction can be made more robust
4 participants