Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain why a subject was matched #19

Open
osma opened this issue Oct 5, 2017 · 4 comments
Open

Explain why a subject was matched #19

osma opened this issue Oct 5, 2017 · 4 comments
Milestone

Comments

@osma
Copy link
Member

osma commented Oct 5, 2017

When Annif returns bad subjects, it can be difficult to understand why they were suggested. An explain parameter for the analyze functionality could be used to enable explanation functionality, which would return, for each suggested subject, the text from all of the blocks in the document that contributed to the subject assignment, sorted by their scores (highest score first). This would give at least some kind of idea which parts of the document caused the match.

@osma osma added this to the Long term milestone Oct 5, 2017
@osma
Copy link
Member Author

osma commented May 19, 2018

LIME could be useful for this: https://github.com/marcotcr/lime/

@annakasprzik
Copy link

In general, I would like an option both for 'suggest' and for 'eval' that returns the confidence scores for each descriptor and for each document, for evaluation purposes. Not sure if Annif already produces such an output anywhere?

@osma
Copy link
Member Author

osma commented Sep 3, 2019

@annakasprzik This is what the suggest command does - it will give you the confidence scores in the output. Like this:

$ echo "the cat sat on the mat" | annif suggest tfidf-en
<http:https://www.yso.fi/onto/yso/p26645>	place mats	0.5739196571753897
<http:https://www.yso.fi/onto/yso/p19378>	cat	0.412109991386263
<http:https://www.yso.fi/onto/yso/p864>	Felidae	0.4004559418090339
<http:https://www.yso.fi/onto/yso/p24992>	stray cats	0.31746311805949967
<http:https://www.yso.fi/onto/yso/p24619>	exotic (cat)	0.27605877849495275
<http:https://www.yso.fi/onto/yso/p24278>	Norwegian forest cat	0.2735824095480068
<http:https://www.yso.fi/onto/yso/p24186>	Siberian cat	0.2712520343571323
<http:https://www.yso.fi/onto/yso/p20058>	wildcat	0.2446630680506471
<http:https://www.yso.fi/onto/yso/p21172>	street musicians	0.23004085661703863
<http:https://www.yso.fi/onto/yso/p29087>	cat breeders	0.2211696167751634

The third column is the confidence score (between 0.0 and 1.0). Its interpretation varies a bit between the models.

For the eval command I don't think returning such scores makes sense, as the operation is on a higher level - you give it a bunch of manually indexed documents and it will compare the algorithm-suggested subjects with the manual ones, taking into account the predicted scores, and calculate overall similarity measures like F1 and NDCG.

@osma
Copy link
Member Author

osma commented Sep 3, 2019

BTW there's a great blog post on the ideas behind LIME, by the authors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants