Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn on changing vocab for trained models #326

Closed
juhoinkinen opened this issue Sep 5, 2019 · 3 comments
Closed

Warn on changing vocab for trained models #326

juhoinkinen opened this issue Sep 5, 2019 · 3 comments
Milestone

Comments

@juhoinkinen
Copy link
Member

juhoinkinen commented Sep 5, 2019

When several projects share a vocabulary, it is easy to unintentionally change the vocab for all them, when one means to change the vocab for only one project. Especially when some of the projects are already trained, this can be a problem, because changing the vocab then messes the models (is this always the case?). For example:

$ annif loadvoc tfidf-fi ~/annif-projects/Annif-corpora/vocab/yso-fi.tsv
$ annif train tfidf-fi ~/annif-projects/Annif-corpora/training/2019/yso-cicero-finna-fi-100000-lines.tsv
$ echo kissa | annif suggest tfidf-fi
<http:https://www.yso.fi/onto/yso/p19378>	kissa	0.8595056533813477
<http:https://www.yso.fi/onto/yso/p17959>	kasvianatomia	0.32491984963417053
<http:https://www.yso.fi/onto/yso/p20613>	eläinanatomia	0.31712543964385986
<http:https://www.yso.fi/onto/yso/p18313>	eläinfysiologia	0.2782534062862396
<http:https://www.yso.fi/onto/yso/p20292>	biomekaniikka	0.25385648012161255
<http:https://www.yso.fi/onto/yso/p18481>	eläinten käyttäytyminen	0.2513505518436432
<http:https://www.yso.fi/onto/yso/p10562>	kasvifysiologia	0.2394426316022873
<http:https://www.yso.fi/onto/yso/p11669>	nimipäivät	0.18711934983730316
<http:https://www.yso.fi/onto/yso/p675>	lemmikkieläimet	0.17345558106899261
<http:https://www.yso.fi/onto/yso/p22993>	naksutinkoulutus	0.15796299278736115

# Now load (a different) vocab for fasttext (which has the same vocab setting in projects.cfg as tfidf):
$ annif loadvoc fasttext-fi tests/corpora/archaeology/subjects.tsv
$ echo kissa | annif suggest tfidf-fi
$ 
(No results)

Annif could give a warning when reloading a vocabulary, which could list all the projects that share the vocabulary, or at least all the projects that have been already trained using the vocabulary that now changes. Or there could even be a confirmation prompt for the latter projects case.

@juhoinkinen juhoinkinen modified the milestone: Short term Sep 5, 2019
@osma
Copy link
Member

osma commented Sep 30, 2019

This is a good idea. However, if we implemented #274 first then the problem would be at least partly mitigated.

@osma osma added this to the Long term milestone Sep 30, 2019
@juhoinkinen
Copy link
Member Author

As Osma pointed today, a similar problem probably arises also when projects with different vocabularies are combined to an ensemble model.

@juhoinkinen
Copy link
Member Author

I think this can be closed, because since #614 the argument to the load-vocabulary command is a vocabulary ID, not a project ID, so it is a less surprise that the operation affects (or can affect) multiple projects. Also since #274 it could be possible to "undo" loading a wrong vocabulary, because the original URIs are retained in the internal vocabulary, so just loading the original vocabulary back could reset the situation. (Disclaimer: I'm not sure about this.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants