-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance for updating & querying vector_ids in the FAISSDocumentStore #383
Labels
type:feature
New feature or request
Comments
Let's prioritize this one. In our new benchmarks, we found that the current SQL schema is also slowing down queries with FAISSDocumentStore significantly:
So this is quite a blocker for large scale usage of FAISS and appropriate benchmarking. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The
vector_id
from FAISS gets stored as a metadata field in theFAISSDocumentStore
.When mebeddings are updated, the
FAISSDocumentStore.update_embeddings()
method callsSQLDocumentStore.update_document_meta()
to replace the entire document metadata with the newvector_id
. This takes a considerable time as all metadata fields get updated.A possible solution is to selectively update only the
vector_id
meta field from within theFAISSDocumentStore
. Having the updates within the FAISSDocumentStore would also enable to commit SQL transactions in a batch rather than for each document.The text was updated successfully, but these errors were encountered: