Warn on changing vocab for trained models #326

juhoinkinen · 2019-09-05T11:04:41Z

When several projects share a vocabulary, it is easy to unintentionally change the vocab for all them, when one means to change the vocab for only one project. Especially when some of the projects are already trained, this can be a problem, because changing the vocab then messes the models (is this always the case?). For example:

$ annif loadvoc tfidf-fi ~/annif-projects/Annif-corpora/vocab/yso-fi.tsv
$ annif train tfidf-fi ~/annif-projects/Annif-corpora/training/2019/yso-cicero-finna-fi-100000-lines.tsv
$ echo kissa | annif suggest tfidf-fi
<http:https://www.yso.fi/onto/yso/p19378>	kissa	0.8595056533813477
<http:https://www.yso.fi/onto/yso/p17959>	kasvianatomia	0.32491984963417053
<http:https://www.yso.fi/onto/yso/p20613>	eläinanatomia	0.31712543964385986
<http:https://www.yso.fi/onto/yso/p18313>	eläinfysiologia	0.2782534062862396
<http:https://www.yso.fi/onto/yso/p20292>	biomekaniikka	0.25385648012161255
<http:https://www.yso.fi/onto/yso/p18481>	eläinten käyttäytyminen	0.2513505518436432
<http:https://www.yso.fi/onto/yso/p10562>	kasvifysiologia	0.2394426316022873
<http:https://www.yso.fi/onto/yso/p11669>	nimipäivät	0.18711934983730316
<http:https://www.yso.fi/onto/yso/p675>	lemmikkieläimet	0.17345558106899261
<http:https://www.yso.fi/onto/yso/p22993>	naksutinkoulutus	0.15796299278736115

# Now load (a different) vocab for fasttext (which has the same vocab setting in projects.cfg as tfidf):
$ annif loadvoc fasttext-fi tests/corpora/archaeology/subjects.tsv
$ echo kissa | annif suggest tfidf-fi
$ 
(No results)

Annif could give a warning when reloading a vocabulary, which could list all the projects that share the vocabulary, or at least all the projects that have been already trained using the vocabulary that now changes. Or there could even be a confirmation prompt for the latter projects case.

The text was updated successfully, but these errors were encountered:

osma · 2019-09-30T12:23:21Z

This is a good idea. However, if we implemented #274 first then the problem would be at least partly mitigated.

juhoinkinen · 2020-02-24T13:12:31Z

As Osma pointed today, a similar problem probably arises also when projects with different vocabularies are combined to an ensemble model.

juhoinkinen · 2022-10-24T16:15:50Z

I think this can be closed, because since #614 the argument to the load-vocabulary command is a vocabulary ID, not a project ID, so it is a less surprise that the operation affects (or can affect) multiple projects. Also since #274 it could be possible to "undo" loading a wrong vocabulary, because the original URIs are retained in the internal vocabulary, so just loading the original vocabulary back could reset the situation. (Disclaimer: I'm not sure about this.)

juhoinkinen added the enhancement label Sep 5, 2019

juhoinkinen modified the milestone: Short term Sep 5, 2019

osma added this to the Long term milestone Sep 30, 2019

juhoinkinen closed this as completed Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warn on changing vocab for trained models #326

Warn on changing vocab for trained models #326

juhoinkinen commented Sep 5, 2019 •

edited

Loading

osma commented Sep 30, 2019

juhoinkinen commented Feb 24, 2020

juhoinkinen commented Oct 24, 2022

Warn on changing vocab for trained models #326

Warn on changing vocab for trained models #326

Comments

juhoinkinen commented Sep 5, 2019 • edited Loading

osma commented Sep 30, 2019

juhoinkinen commented Feb 24, 2020

juhoinkinen commented Oct 24, 2022

juhoinkinen commented Sep 5, 2019 •

edited

Loading