Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update stwfsa to 0.2.* #479

Merged
merged 1 commit into from
Apr 12, 2021
Merged

Update stwfsa to 0.2.* #479

merged 1 commit into from
Apr 12, 2021

Conversation

mo-fu
Copy link
Contributor

@mo-fu mo-fu commented Mar 26, 2021

A PR for an updated version of stwfsa. The Improvements are:

  • No more duplicate concepts in suggestions Fixes STWFSA backend returns duplicate results #478
  • Better support for long texts by using frequency and positional distribution features of concepts.
  • Text Vectorizer is disabled by default to reduce memory requirements for large corpora.
  • Improved Escaping of symbols in automaton construction

@sonarcloud
Copy link

sonarcloud bot commented Mar 26, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@osma osma added this to the 0.52 milestone Mar 29, 2021
@osma
Copy link
Member

osma commented Mar 31, 2021

Thanks @mo-fu ! I tested this briefly and it seems to work as advertised.

Am I right that old models (trained with stwfsapy 0.1.x) need to be retrained? At least I got a KeyError: 'use_txt_vec' error when I tried to use an old model. No big issue, but should be mentioned in the Annif release notes.

@osma osma self-assigned this Mar 31, 2021
@mo-fu
Copy link
Contributor Author

mo-fu commented Apr 12, 2021

Yes, that is correct. I have plans to add the version to the serialized files in order to have a better error message.

@osma
Copy link
Member

osma commented Apr 12, 2021

Thanks for clarifying @mo-fu ! I will merge this now so it gets into the soon-to-be-released Annif 0.52.

@osma osma merged commit 2367609 into NatLibFi:master Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

STWFSA backend returns duplicate results
2 participants