Retain subject IDs when loading vocabulary over existing one #274

osma · 2019-05-07T13:54:19Z

Currently if you load a new version of a vocabulary over an existing one, what likely happens is that the integer subject IDs used internally within Annif will change. This will cause mismatches with backends that use the subject IDs (e.g. tfidf, fasttext, vw_multi) within their models, so you have to train them again.

Instead we could try to match URIs in the new vocabulary with old subject IDs and reuse them as much as possible. Then at least loading a vocabulary with additional concepts would not break existing models. If concepts have disappeared from the new version of the vocabulary, they should be marked as nonexistent and filtered from suggestion results.

osma added the enhancement label May 7, 2019

osma added this to the Short term milestone May 7, 2019

osma mentioned this issue Sep 30, 2019

Warn on changing vocab for trained models #326

Closed

osma modified the milestones: Short term, 0.46 Jan 17, 2020

juhoinkinen self-assigned this Jan 27, 2020

juhoinkinen mentioned this issue Jan 31, 2020

Issue274 retain subject ids when loading vocabulary over existing one #383

Merged

juhoinkinen closed this as completed in #383 Feb 14, 2020

juhoinkinen mentioned this issue Apr 26, 2021

Raise warning to advice to retrain project after vocabulary update #485

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retain subject IDs when loading vocabulary over existing one #274

Retain subject IDs when loading vocabulary over existing one #274

osma commented May 7, 2019

Retain subject IDs when loading vocabulary over existing one #274

Retain subject IDs when loading vocabulary over existing one #274

Comments

osma commented May 7, 2019