Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: warning if doc store similarity function is incompatible with Sentence Transformers model #3455

Merged
merged 2 commits into from
Oct 25, 2022
Merged

fix: warning if doc store similarity function is incompatible with Sentence Transformers model #3455

merged 2 commits into from
Oct 25, 2022

Conversation

anakin87
Copy link
Member

Related Issues

Proposed Changes:

Check that Document store uses a similarity function compatible with the Sentence Transformers embedding model
(we can deduce it from the model name).
If the similarity function is incompatible, show a warning.

How did you test it?

Some manual tests

Notes for the reviewer

The topic of the Sentence Transformers nomenclature is a bit tricky.
So, if there is "-cos-" in the model name, we can undoubtedly say that the model is compatible with the cosine similarity, but it might also be compatible with the dot_product.
The solution I propose seems to me to be a good compromise, while the current warning is misleading.

Checklist

@anakin87 anakin87 requested a review from a team as a code owner October 23, 2022 21:16
@anakin87 anakin87 requested review from masci and removed request for a team October 23, 2022 21:16
@ZanSara ZanSara added type:feature New feature or request topic:modeling type:refactor Not necessarily visible to the users topic:retriever labels Oct 24, 2022
@ZanSara
Copy link
Contributor

ZanSara commented Oct 24, 2022

In general this change is fine, but I wonder about the value of this check. If there's no established nomenclature for recommended similarity metric in the model name, isn't this system prone to false positives and similar spurious warnings? I'd rather have no warning at all than one that risks misleading the users. @tholor what do you think?

@ZanSara ZanSara requested review from ZanSara and removed request for masci October 24, 2022 13:52
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @anakin87 !

I think you made a pragmatic choice here of only raising the warning when the similarity metric is clearly mentioned in the model name.

@ZanSara I really like that you think about user value here. I would argue though that with the current implementation we won't see many false positives. The models that state -cos- or -dot- are pretty clearly using those similarity functions. For all other models (probably the majority) we won't raise any warning.
Raising the warning in the few cases is still pretty valuable as it was a big source of errors from users in the past and you won't really notice this as a user as it's only failing silently (= lower performance).

@ZanSara
Copy link
Contributor

ZanSara commented Oct 25, 2022

@tholor sounds good, thank you for the input 😊 Merging!

@ZanSara ZanSara merged commit a2d459d into deepset-ai:main Oct 25, 2022
@ZanSara ZanSara removed their request for review October 25, 2022 15:01
@anakin87 anakin87 deleted the better_sentencetransformers_similarity_warning branch October 25, 2022 15:05
sjrl pushed a commit that referenced this pull request Oct 25, 2022
…ntence Transformers model (#3455)

* check_docstore_similarity_function

* remove import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:modeling topic:retriever type:feature New feature or request type:refactor Not necessarily visible to the users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update warning for dot_product when using sentence transformers
3 participants