Is Cosine-Similarity of Embeddings Really About Similarity?

Steck, Harald; Ekanadham, Chaitanya; Kallus, Nathan

doi:10.1145/3589335.3651526

Computer Science > Information Retrieval

arXiv:2403.05440 (cs)

[Submitted on 8 Mar 2024]

Title:Is Cosine-Similarity of Embeddings Really About Similarity?

Authors:Harald Steck, Chaitanya Ekanadham, Nathan Kallus

View PDF HTML (experimental)

Abstract:Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless `similarities.' For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. We discuss implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, we caution against blindly using cosine-similarity and outline alternatives.

Comments:	9 pages
Subjects:	Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2403.05440 [cs.IR]
	(or arXiv:2403.05440v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2403.05440
Journal reference:	ACM Web Conference 2024 (WWW 2024 Companion)
Related DOI:	https://doi.org/10.1145/3589335.3651526

Submission history

From: Harald Steck [view email]
[v1] Fri, 8 Mar 2024 16:48:20 UTC (5,737 KB)

Computer Science > Information Retrieval

Title:Is Cosine-Similarity of Embeddings Really About Similarity?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Is Cosine-Similarity of Embeddings Really About Similarity?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators