-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify different between dot_product
and cosine
similarities
#91260
Comments
Pinging @elastic/es-search (Team:Search) |
This would also help with the byte-sized vectors work (#89784). Currently if you use |
Pinging @elastic/es-docs (Team:Docs) |
Docs & default behavior have been significantly improved since this issue was opened. Closing |
Approximate kNN search supports two similarities that are really similar:
cosine
accepts any vector and computes the cosine similarity between themdot_product
requires vectors to be of magnitude 1, and computes the cosine similarity between themOur recommendation is to use
dot_product
if possible, since it avoids computing the vector magnitudes (they're always 1), making search significantly faster. It's a bit confusing to have two similarities for the same use case -- users often just choosecosine
and get suboptimal performance.Maybe we could update
cosine
to compute and store the vector magnitudes while indexing. We could also compute the query magnitude once per search. Then, we could just reuse the magnitudes during the similarity computation. We do this for non-indexeddense_vector
fields and found it really improved performance (#46294). This would require changes to how Lucene indexes vectors, described here: apache/lucene#11228.We could then either remove
dot_product
, or expand its purpose. (For example, maybedot_product
could accept vectors of any length, which is helpful in recommendations use cases? This would require research.)The text was updated successfully, but these errors were encountered: