fix: explicitly tell `ContextRelevanceEvaluator` that each statement should be scored #7904

davidsbatista · 2024-06-20T15:28:49Z

Related Issues

fixes ContextRelevanceEvaluator: statements extraction can be made more robust #7803

Proposed Changes:

added a few more examples containing multiple non-relevant statements
added one more line to the prompt mentioning that each statement should be scored

How did you test it?

run unit tests and live test, also added one more live test

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

coveralls · 2024-06-20T15:37:51Z

Pull Request Test Coverage Report for Build 9600236879

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 90.171%

Totals
Change from base Build 9596818530:	0.0%
Covered Lines:	7018
Relevant Lines:	7783

💛 - Coveralls

anakin87

Apart from a little doubt, it looks like a good improvement.

I would like Julian to take a look at it as well.

anakin87 · 2024-06-20T15:36:52Z

test/components/evaluators/test_context_relevance_evaluator.py

+ questions = ["Who created the Python language?"]
+ contexts = [
+ [
+ "Python, created by Guido van Rossum in the late 1980s, is a high-level general-purpose programming language. Its design philosophy emphasizes code readability, and its language constructs aim to help programmers write clear, logical code for both small and large-scale software projects.",


How many statements can be extracted from this sentence? 2 or 3?

The Java is a high... context also should have 2 statements + the Scale is a.. gives 5 statements

I've reverted this, so that it's more deterministic - the LLM can say it's 4 or 5

coveralls · 2024-06-20T15:51:56Z

Pull Request Test Coverage Report for Build 9600451557

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 90.171%

Totals
Change from base Build 9596818530:	0.0%
Covered Lines:	7018
Relevant Lines:	7783

💛 - Coveralls

coveralls · 2024-06-20T16:29:44Z

Pull Request Test Coverage Report for Build 9600928697

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 90.171%

Totals
Change from base Build 9596818530:	0.0%
Covered Lines:	7018
Relevant Lines:	7783

💛 - Coveralls

coveralls · 2024-06-21T10:42:11Z

Pull Request Test Coverage Report for Build 9612276285

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
73 unchanged lines in 4 files lost coverage.
Overall coverage decreased (-0.2%) to 89.977%

Files with Coverage Reduction	New Missed Lines	%
document_stores/in_memory/document_store.py	1	97.87%
core/component/component.py	2	98.28%
utils/hf.py	20	83.89%
components/validators/json_schema.py	50	0.0%

Totals
Change from base Build 9596818530:	-0.2%
Covered Lines:	6715
Relevant Lines:	7463

💛 - Coveralls

davidsbatista · 2024-06-25T08:20:00Z

@julian-risch can you have look as well?

julian-risch

Looks quite good to me already. Just remove the progress_bar=False from the init in the tests please. Let's test the default init which is more likely to be used by users.
In addition please run the end to end test for evaluation. It contains the ContextRelevanceEvaluator: https://github.com/deepset-ai/haystack/blob/main/e2e/pipelines/test_evaluation_pipeline.py
Once that's done, I'll approve this PR. 🚀

coveralls · 2024-06-25T14:32:56Z

Pull Request Test Coverage Report for Build 9664305794

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
97 unchanged lines in 8 files lost coverage.
Overall coverage decreased (-0.2%) to 89.968%

Files with Coverage Reduction	New Missed Lines	%
components/evaluators/document_map.py	1	96.15%
components/evaluators/document_mrr.py	1	95.45%
core/component/component.py	2	98.28%
components/converters/pypdf.py	6	90.0%
components/evaluators/sas_evaluator.py	8	63.33%
document_stores/in_memory/document_store.py	9	97.02%
utils/hf.py	20	83.89%
components/validators/json_schema.py	50	0.0%

Totals
Change from base Build 9596818530:	-0.2%
Covered Lines:	6717
Relevant Lines:	7466

💛 - Coveralls

julian-risch

LGTM! 👍

initial import

c9d2608

davidsbatista requested a review from a team as a code owner June 20, 2024 15:28

davidsbatista requested review from vblagoje and julian-risch and removed request for a team June 20, 2024 15:28

github-actions bot added the topic:tests label Jun 20, 2024

davidsbatista requested review from anakin87 and removed request for vblagoje June 20, 2024 15:29

adding release notes

05a8e7a

davidsbatista requested a review from a team as a code owner June 20, 2024 15:30

davidsbatista requested review from dfokina and removed request for a team June 20, 2024 15:30

anakin87 reviewed Jun 20, 2024

View reviewed changes

adding pytest decorator for live test

6135a13

make examples more readable

2568e23

updating tests

49c551e

github-actions bot added the type:documentation Improvements on the docs label Jun 21, 2024

davidsbatista added the topic:eval label Jun 25, 2024

reverting progress_bar = False

e59bbb8

julian-risch requested changes Jun 25, 2024

View reviewed changes

julian-risch self-requested a review June 25, 2024 14:33

julian-risch approved these changes Jun 25, 2024

View reviewed changes

davidsbatista merged commit 8b9eddc into main Jun 25, 2024
17 checks passed

davidsbatista deleted the CntextRelevancePrompt-evaluate-all-statements branch June 25, 2024 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: explicitly tell `ContextRelevanceEvaluator` that each statement should be scored #7904

fix: explicitly tell `ContextRelevanceEvaluator` that each statement should be scored #7904

davidsbatista commented Jun 20, 2024

coveralls commented Jun 20, 2024 •

edited

Loading

anakin87 left a comment

anakin87 Jun 20, 2024

davidsbatista Jun 20, 2024

davidsbatista Jun 20, 2024

davidsbatista Jun 21, 2024

coveralls commented Jun 20, 2024 •

edited

Loading

coveralls commented Jun 20, 2024 •

edited

Loading

coveralls commented Jun 21, 2024 •

edited

Loading

davidsbatista commented Jun 25, 2024

julian-risch left a comment

coveralls commented Jun 25, 2024 •

edited

Loading

julian-risch left a comment

fix: explicitly tell ContextRelevanceEvaluator that each statement should be scored #7904

fix: explicitly tell ContextRelevanceEvaluator that each statement should be scored #7904

Conversation

davidsbatista commented Jun 20, 2024

Related Issues

Proposed Changes:

How did you test it?

Checklist

coveralls commented Jun 20, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9600236879

Details

💛 - Coveralls

anakin87 left a comment

Choose a reason for hiding this comment

anakin87 Jun 20, 2024

Choose a reason for hiding this comment

davidsbatista Jun 20, 2024

Choose a reason for hiding this comment

davidsbatista Jun 20, 2024

Choose a reason for hiding this comment

davidsbatista Jun 21, 2024

Choose a reason for hiding this comment

coveralls commented Jun 20, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9600451557

Details

💛 - Coveralls

coveralls commented Jun 20, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9600928697

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

coveralls commented Jun 21, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9612276285

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

davidsbatista commented Jun 25, 2024

julian-risch left a comment

Choose a reason for hiding this comment

coveralls commented Jun 25, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9664305794

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

julian-risch left a comment

Choose a reason for hiding this comment

fix: explicitly tell `ContextRelevanceEvaluator` that each statement should be scored #7904

fix: explicitly tell `ContextRelevanceEvaluator` that each statement should be scored #7904

coveralls commented Jun 20, 2024 •

edited

Loading

coveralls commented Jun 20, 2024 •

edited

Loading

coveralls commented Jun 20, 2024 •

edited

Loading

coveralls commented Jun 21, 2024 •

edited

Loading

coveralls commented Jun 25, 2024 •

edited

Loading