Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Dont filter negative scores when using BM25Okapi and scale_score=False #6889

Merged
merged 5 commits into from
Feb 6, 2024

Conversation

ZanSara
Copy link
Contributor

@ZanSara ZanSara commented Feb 1, 2024

Related Issues

Proposed Changes:

Add an exception in the InMemoryDocumentStore.bm25_retrieval() to not filter out negative values if the Okapi algorithm is used with scale_score=False, as such values are meaningful.

This PR also changes the default algorithm from BM25Okapi to BM25L to prevent the exception form being the default state of this Retriever. A few tests were adjusted as a consequence.

How did you test it?

Local tests

Notes for the reviewer

Checklist

@github-actions github-actions bot added topic:tests 2.x Related to Haystack v2.0 labels Feb 1, 2024
@ZanSara ZanSara marked this pull request as ready for review February 1, 2024 17:03
@ZanSara ZanSara requested a review from a team as a code owner February 1, 2024 17:03
@ZanSara ZanSara requested review from silvanocerza and removed request for a team February 1, 2024 17:03
@ZanSara ZanSara added the ignore-for-release-notes PRs with this flag won't be included in the release notes. label Feb 1, 2024
@ZanSara ZanSara marked this pull request as draft February 1, 2024 17:13
@coveralls
Copy link
Collaborator

coveralls commented Feb 2, 2024

Pull Request Test Coverage Report for Build 7797498811

Warning: This coverage report may be inaccurate.

We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
To ensure accuracy in future PRs, please see these guidelines.
A quick fix for this PR: rebase it; your next report should be accurate.

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 252 unchanged lines in 30 files lost coverage.
  • Overall coverage decreased (-0.7%) to 88.291%

Files with Coverage Reduction New Missed Lines %
utils/auth.py 1 93.88%
components/websearch/searchapi.py 2 96.36%
components/builders/dynamic_chat_prompt_builder.py 3 93.02%
components/embedders/hugging_face_tei_document_embedder.py 3 95.24%
components/embedders/hugging_face_tei_text_embedder.py 3 92.86%
components/generators/hugging_face_local.py 3 96.0%
components/generators/hugging_face_tgi.py 3 95.4%
core/pipeline/descriptions.py 3 54.55%
components/embedders/sentence_transformers_document_embedder.py 4 90.7%
components/embedders/sentence_transformers_text_embedder.py 4 88.57%
Totals Coverage Status
Change from base Build 7743895092: -0.7%
Covered Lines: 4728
Relevant Lines: 5355

💛 - Coveralls

@ZanSara ZanSara marked this pull request as ready for review February 2, 2024 10:57
@github-actions github-actions bot added the type:documentation Improvements on the docs label Feb 6, 2024
Copy link
Contributor

@silvanocerza silvanocerza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to merge. 👍

@ZanSara ZanSara merged commit 1182c08 into main Feb 6, 2024
20 checks passed
@ZanSara ZanSara deleted the dont-filter-unscaled-bm25-scores branch February 6, 2024 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 ignore-for-release-notes PRs with this flag won't be included in the release notes. topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BM25Retriever scores <0 filtered out even with algorithms that allow it
3 participants