Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance for updating & querying vector_ids in the FAISSDocumentStore #383

Closed
tanaysoni opened this issue Sep 16, 2020 · 1 comment · Fixed by #460
Closed
Assignees
Labels
type:feature New feature or request

Comments

@tanaysoni
Copy link
Contributor

The vector_id from FAISS gets stored as a metadata field in the FAISSDocumentStore.

When mebeddings are updated, the FAISSDocumentStore.update_embeddings() method calls SQLDocumentStore.update_document_meta() to replace the entire document metadata with the new vector_id. This takes a considerable time as all metadata fields get updated.

A possible solution is to selectively update only the vector_id meta field from within the FAISSDocumentStore. Having the updates within the FAISSDocumentStore would also enable to commit SQL transactions in a batch rather than for each document.

@tanaysoni tanaysoni added the type:feature New feature or request label Sep 16, 2020
@tanaysoni tanaysoni self-assigned this Sep 16, 2020
@tholor
Copy link
Member

tholor commented Oct 1, 2020

Let's prioritize this one. In our new benchmarks, we found that the current SQL schema is also slowing down queries with FAISSDocumentStore significantly:

  • 100 queries on 50k DPR embeddings with HNSWFlat, InnerProduct
  • with SQL: 0.284 sec / query
  • without SQL: 0.0186 sec / query

So this is quite a blocker for large scale usage of FAISS and appropriate benchmarking.

@tholor tholor changed the title Improve performance for updating vector_ids in the FAISSDocumentStore Improve performance for updating & querying vector_ids in the FAISSDocumentStore Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants