-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lin's similarity measure might be implemented incorrectly in GOATOOLS - outputs negative values #120
Comments
I will write a test for this. But I do not have time to look into it further as the other important issues that need my attention are:
@alex-wave, Can you add to the conversation regarding Lin's similarity score? |
The test which shows a negative Lin score is https://github.com/tanghaibao/goatools/blob/master/tests/semantic_i88.py This plot shows the two user-specifed GO terms (green) and their deepest common ancestor (DCA), biological process (blue): The numbers next to the "i" in the GO Term boxes show the information content:
The information scores appear to make sense, meaning the least annotated one, root development(7.56), has a higher score than the middle one, multicellular organismal process(5.48). And they are both have higher information scores than their DCA, biological process(3.30). Resnick's similarity score is defined as the information content between the DCA, which would be 3.30 in our case. Lin's score is coded as -1 * 2 * Resnick/(info_score(GO:0032501) + info_score(GO:0048364)), which is calculated to -0.505 in this example. That Lin's score is a fraction makes sense. I am just not sure why there is a '-1' in the equation if the information content values are always positive. Perhaps this is the issue? |
@dvklopfenstein there does appear to be a problem with the Lin similarity scores from GOATOOLS. The "-1.0" should not be there - Lin's similarity measure is defined as I've created a pull request to fix this #121 |
Thank you Dr. Warwick Vesztrocy very much for taking your time to look at this and issue a pull request. I have merged your pull request and will close this issue. Thank you, Dr. Grønning, for bringing this to our attention. |
Received the following issue with a direct email
Dear Haibao,
Thank you for GOATOOLS --> very nice tool!
I looked through your example of how to calculate semantic similarity using GOATOOLS:
https://github.com/tanghaibao/goatools/blob/master/notebooks/semantic_similarity.ipynb
And it looks as if you have implemented Lin's similarity incorrectly, as you get negative values as output.
Kind regards,
Alexander Grønning
The text was updated successfully, but these errors were encountered: