Clarify different between `dot_product` and `cosine` similarities #91260

jtibshirani · 2022-11-02T23:07:31Z

Approximate kNN search supports two similarities that are really similar:

cosine accepts any vector and computes the cosine similarity between them
dot_product requires vectors to be of magnitude 1, and computes the cosine similarity between them

Our recommendation is to use dot_product if possible, since it avoids computing the vector magnitudes (they're always 1), making search significantly faster. It's a bit confusing to have two similarities for the same use case -- users often just choose cosine and get suboptimal performance.

Maybe we could update cosine to compute and store the vector magnitudes while indexing. We could also compute the query magnitude once per search. Then, we could just reuse the magnitudes during the similarity computation. We do this for non-indexed dense_vector fields and found it really improved performance (#46294). This would require changes to how Lucene indexes vectors, described here: apache/lucene#11228.

We could then either remove dot_product, or expand its purpose. (For example, maybe dot_product could accept vectors of any length, which is helpful in recommendations use cases? This would require research.)

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2022-11-02T23:07:54Z

Pinging @elastic/es-search (Team:Search)

jtibshirani · 2022-11-04T19:10:34Z

This would also help with the byte-sized vectors work (#89784). Currently if you use dot_product with element_type: byte, then we just assume all the vectors have the same magnitude, but don't enforce it. The score is also a bit strange, to account for the fact the vectors can have any length (it's 0.5 + (dot_product / (32768 * dims))). Everything works out more nicely if you use the cosine similarity.

elasticsearchmachine · 2023-06-12T19:17:15Z

Pinging @elastic/es-docs (Team:Docs)

benwtrent · 2024-06-24T14:45:52Z

Docs & default behavior have been significantly improved since this issue was opened. Closing

jtibshirani added >enhancement :Search Relevance/Vectors Vector search labels Nov 2, 2022

elasticsearchmachine added the Team:Search Meta label for search team label Nov 2, 2022

jtibshirani mentioned this issue Nov 2, 2022

ANN search improvements #84324

Open

43 tasks

mayya-sharipova added the >docs General docs changes label Jun 12, 2023

elasticsearchmachine added the Team:Docs Meta label for docs team label Jun 12, 2023

benwtrent closed this as completed Jun 24, 2024

javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify different between `dot_product` and `cosine` similarities #91260

Clarify different between `dot_product` and `cosine` similarities #91260

jtibshirani commented Nov 2, 2022 •

edited

Loading

elasticsearchmachine commented Nov 2, 2022

jtibshirani commented Nov 4, 2022

elasticsearchmachine commented Jun 12, 2023

benwtrent commented Jun 24, 2024

Clarify different between dot_product and cosine similarities #91260

Clarify different between dot_product and cosine similarities #91260

Comments

jtibshirani commented Nov 2, 2022 • edited Loading

elasticsearchmachine commented Nov 2, 2022

jtibshirani commented Nov 4, 2022

elasticsearchmachine commented Jun 12, 2023

benwtrent commented Jun 24, 2024

Clarify different between `dot_product` and `cosine` similarities #91260

Clarify different between `dot_product` and `cosine` similarities #91260

jtibshirani commented Nov 2, 2022 •

edited

Loading