Incremental learning in tfidf backend #226

osma · 2018-12-14T09:56:38Z

The tfidf backend could support incremental learning with a few adjustments (inspired by SimIndex in gensim.simserver):

switch from MatrixSimilarity to Similarity, which allows additions of documents (i.e. subjects for us)
maintain mappings between subjects and document IDs in the index; this would have to be persisted along with the index (e.g. using SQLite as in SimIndex)

But the more challenging problem is to figure out how learning operations should affect the "documents" (representations of subjects) in the index. Probably this could be handled by vector operations, along these lines:

for correcting false positives, take the existing subject vector and subtract the vector of the current document (multiplied by a small factor such as 0.1 or 0.01); replace the old subject vector with the result
for correcting false negatives, take the existing subject vector and add the vector of the current document (multiplied by a small factor such as 0.1 or 0.01); replace the old subject vector with the result

It is possible to retrieve the existing subject vectors from the index using Similarity.vector_by_id and the learned document can be turned to vector using tf-idf transformation.

Requires #225

The text was updated successfully, but these errors were encountered:

osma added the enhancement label Dec 14, 2018

osma added this to the Short term milestone Dec 14, 2018

osma mentioned this issue Dec 14, 2018

Contextual learning #18

Closed

osma mentioned this issue Feb 8, 2019

Support for online learning #257

Merged

osma modified the milestones: Short term, Long term Jul 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental learning in tfidf backend #226

Incremental learning in tfidf backend #226

osma commented Dec 14, 2018

Incremental learning in tfidf backend #226

Incremental learning in tfidf backend #226

Comments

osma commented Dec 14, 2018