-
-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rename jaro_distance to jaro_similarity #122
Comments
To be honest, I was confused by this a few weeks ago. So maybe renaming it would be helpful. EDIT: |
@Niecke I don't think that's true. Here's a simple test that shows that the logic is inverted. import numpy as np
from jellyfish import levenshtein_distance, jaro_distance
a = "aabb"
b = "abab"
# We expect the output to be 1, since a has a smaller distance to itself than to something else
levenshtein = np.argmin([levenshtein_distance(a, b), levenshtein_distance(a, a)])
# Jaro gives the opposite result.
jaro = np.argmin([jaro_distance(a, b), jaro_distance(a, a)])
# But inverting it gives us the correct answer
jaro_inv = np.argmin([1 - jaro_distance(a, b), 1 - jaro_distance(a, a)]) This shows that the |
looking at ways to do this without breaking everyone, it'll probably require a major version upgrade to be final, but I could add the renamed function in the interim |
as of 0.8.2 this is now the correct name, the old names still function but raise a warning I will fix the _distance variants in 1.0 |
The
jaro_distance
function returns a similarity (1 for exact matches) instead of a distance (0 for exact matches), which is the opposite of what the name implies. Would it be possible to rename the function to readjaro_similarity
instead to avoid confusion?The text was updated successfully, but these errors were encountered: