Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STWFSA backend returns duplicate results #478

Closed
osma opened this issue Mar 22, 2021 · 2 comments · Fixed by #479
Closed

STWFSA backend returns duplicate results #478

osma opened this issue Mar 22, 2021 · 2 comments · Fixed by #479
Labels
Milestone

Comments

@osma
Copy link
Member

osma commented Mar 22, 2021

I set up a stwfsa based project using this configuration adapted from the wiki page of the STWFSA backend:

[stw-stwfsa-en]
name=STWFSA STW english
language=en
backend=stwfsa
vocab=stw-en
concept_type_uri=http:https://zbw.eu/namespaces/zbw-extensions/Descriptor
sub_thesaurus_type_uri=http:https://zbw.eu/namespaces/zbw-extensions/Thsys
thesaurus_relation_type_uri=http:https://www.w3.org/2004/02/skos/core#broader
thesaurus_relation_is_specialisation=False

Then I trained this with stw-econbiz-small.tsv.gz from the Annif-tutorial stw-zbw data set.

When I use annif suggest with text that contains multiple matches for the same term, I get duplicate results:

$ echo "economics. economics" | annif suggest stw-stwfsa-en
<http:https://zbw.eu/stw/descriptor/10032-2>	Economics	0.14664009111617313
<http:https://zbw.eu/stw/descriptor/10032-2>	Economics	0.14664009111617313

This is obviously wrong, since the list of suggestions should not contain duplicates.

Looking at this PR that was quite recently merged to stwfsapy, I think this may already have been fixed upstream. Annif would just need to switch to a newer version of stwfsapy (and such a version must be released first). Ping @mo-fu - do you agree with the analysis? Could this be fixed on the Annif side as well by upgrading stwfsapy?

@osma osma added the bug label Mar 22, 2021
@mo-fu
Copy link
Contributor

mo-fu commented Mar 23, 2021

Yes, this was recently fixed upstream. I try to get out a release this week. There are additional changes for handling of longer inputs that I want to include.

@osma
Copy link
Member Author

osma commented Mar 23, 2021

Great to hear, thanks @mo-fu!

@osma osma closed this as completed in #479 Apr 12, 2021
@juhoinkinen juhoinkinen added this to the 0.52 milestone Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants