Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spaCy analyzer #374

Closed
osma opened this issue Dec 20, 2019 · 2 comments · Fixed by #527
Closed

spaCy analyzer #374

osma opened this issue Dec 20, 2019 · 2 comments · Fixed by #527
Assignees
Milestone

Comments

@osma
Copy link
Member

osma commented Dec 20, 2019

We could add an analyzer based on spaCy. It would enable support for some new languages and possibly also give better results for eg English and German than the current snowball analyzer.

Getting the full benefit of spaCy may require some internal API changes because it is more object oriented than NLTK and processes whole sentences instead of just individual words, taking some of the context into account.

This would be an optional feature as spaCy is implemented as a native code extension, not just pure Python.

@osma osma added this to the Short term milestone Dec 20, 2019
@osma
Copy link
Member Author

osma commented Aug 30, 2021

spaCy also has some support for document (text) categorization, both multiclass and multilabel. This could be supported as an Annif backend: https://spacy.io/api/textcategorizer

@osma osma self-assigned this Aug 31, 2021
@osma
Copy link
Member Author

osma commented Aug 31, 2021

Starting to look at this more seriously...

One immediate issue that comes up is that spacy (3.1.2) depends on typer (0.3.2) which at the moment depends on click==7.1.2, while Annif depends on click==8.0.1 since PR #499 was merged just before the 0.53 release.

Now this appears to be fixed with typer 0.4.0 that was released yesterday(!) and supports Click 8, but it won't help until there's a newer version of spaCy available that upgrades the typer dependency.

For now we may have to downgrade back to Click 7.1.2, which probably isn't a problem for Annif since I don't think we've started using any Click 8 features yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants