Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure update embeddings #304

Merged
merged 9 commits into from
Aug 18, 2020
Merged

Restructure update embeddings #304

merged 9 commits into from
Aug 18, 2020

Conversation

bogdankostic
Copy link
Contributor

@bogdankostic bogdankostic commented Aug 11, 2020

This PR moves parts of the update_embedding method in ElasticsearchDocumentStore and FAISSDocumentStore to the embed_passages method in dense retrievers.

embed_passages now takes as argument a list of Documents instead of a list of strings (=documents) and another optional list of strings (=titles). Furthermore, DensePassageRetriever now contains the instance variable embed_title which can be set to include titles in embedding of passages.

Old Way

dpr = DensePassageRetriever(document_store, embedding_model)

list_of_documents = document_store.get_all_documents()
list_of_strings = [doc.text for doc in list_of_documents]
list_of_titles = [doc.meta["name"] for doc in list_of_documents]

embeddings = dpr.embed_passages(list_of_strings, list_of_titles)

New Way

dpr = DensePassageRetriever(document_store, embedding_model, embed_title=True)

list_of_documents = document_store.get_all_documents()

embeddings = dpr.embed_passages(list_of_documents)

@bogdankostic
Copy link
Contributor Author

Tests don't work yet with FAISSDocument. There seem to be two problems:

  • It's not possible to add documents without meta field
  • get_all_documents returns docs without embeddings

@Timoeller
Copy link
Contributor

About the problems

It's not possible to add documents without meta field

@tanaysoni will look into it

get_all_documents returns docs without embeddings

that is unfortunately a feature, not a bug : ) in faiss you cannot get the actual embedding back, but just query and get vector ids. So we have to exclude the embedding value assertion for this case.

@Timoeller
Copy link
Contributor

It's not possible to add documents without meta field

should be fixed with #310

@tholor
Copy link
Member

tholor commented Aug 18, 2020

@bogdankostic Please document all breaking changes in your first comment of this PR with a small example old vs new (e.g args for embed_passages). Also add a label "breaking change". Thx!

Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the detailed notes.

Looking good, lets merge.

@Timoeller Timoeller merged commit 72b1013 into master Aug 18, 2020
@julian-risch julian-risch deleted the embeddings_in_retriever branch November 15, 2021 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants