You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tfidf backend could support incremental learning with a few adjustments (inspired by SimIndex in gensim.simserver):
switch from MatrixSimilarity to Similarity, which allows additions of documents (i.e. subjects for us)
maintain mappings between subjects and document IDs in the index; this would have to be persisted along with the index (e.g. using SQLite as in SimIndex)
But the more challenging problem is to figure out how learning operations should affect the "documents" (representations of subjects) in the index. Probably this could be handled by vector operations, along these lines:
for correcting false positives, take the existing subject vector and subtract the vector of the current document (multiplied by a small factor such as 0.1 or 0.01); replace the old subject vector with the result
for correcting false negatives, take the existing subject vector and add the vector of the current document (multiplied by a small factor such as 0.1 or 0.01); replace the old subject vector with the result
It is possible to retrieve the existing subject vectors from the index using Similarity.vector_by_id and the learned document can be turned to vector using tf-idf transformation.
The
tfidf
backend could support incremental learning with a few adjustments (inspired by SimIndex in gensim.simserver):But the more challenging problem is to figure out how learning operations should affect the "documents" (representations of subjects) in the index. Probably this could be handled by vector operations, along these lines:
It is possible to retrieve the existing subject vectors from the index using
Similarity.vector_by_id
and the learned document can be turned to vector using tf-idf transformation.Requires #225
The text was updated successfully, but these errors were encountered: