How to get embeddings from a dense retriever #226

Krak91 · 2020-07-14T07:54:35Z

Hi, I've been trying to find out why the embeddingretriever is returning just an empty list and found out that the embeddings were indeed generated, bur the retrieve() method is trying to return documents instead? I had to change the method to look like this:
(haystack/retriever/dense lines 245-247)
def retrieve(self, query: str, filters: dict = None, top_k: int = 10, index: str = None) -> List[Document]:
if index is None:
index = self.document_store.index
query_emb = self.embed(texts=[query])
# documents = self.document_store.query_by_embedding(query_emb=query_emb[0], filters=filters,
# top_k=top_k, index=index)
# return documents
return query_emb

I'm trying to understand this function's purpose - is it meant to return documents relevant to our input strings? Why is this trying to return documents?

tholor · 2020-07-14T08:00:23Z

Yes, the goal of the retrieve() method is to return a list of Documents that are "similar" to our query. In the case of the DensePassageRetriever this means comparing the embedding of our query (query_emb) to the ones of the documents. Those document embeddings should have been previously created and stored in the DocumentStore as in this Tutorial:

haystack/tutorials/Tutorial6_Better_Retrieval_via_DPR.py

Lines 51 to 56 in b886e05

 # Important: 

 # Now that after we have the DPR initialized, we need to call update_embeddings() to iterate over all 

 # previously indexed documents and update their embedding representation. 

 # While this can be a time consuming operation (depending on corpus size), it only needs to be done once. 

 # At query time, we only need to embed the query and compare it the existing doc embeddings which is very fast. 

 document_store.update_embeddings(retriever)

Krak91 · 2020-07-14T08:17:13Z

In the case that we just want to generate embeddings for strings, could we go about passing the function an empty document store? (given that we return the embeddings instead of the docs as above)

tholor · 2020-07-14T08:20:51Z

If you just want to create embedding you can use:

# queries
retriever.embed_queries(list_of_strings)
# passages
retriever.embed_passages(list_of_strings)

Note, that for DPR the two methods use different encoder models while for the EmbeddingRetriever both use the same.

Krak91 · 2020-07-14T08:22:36Z

Great! thanks

salbatarni · 2023-08-17T21:14:19Z

hello
looks like retriever.embed_passages(list_of_strings) is not working anymore...
so how to get the embedding of a passage?

Krak91 added the question label Jul 14, 2020

tholor changed the title ~~EmbeddingRetriever.retrieve()~~ How to get embeddings from a dense retriever Jul 14, 2020

tholor closed this as completed Jul 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get embeddings from a dense retriever #226

How to get embeddings from a dense retriever #226

Krak91 commented Jul 14, 2020

tholor commented Jul 14, 2020

Krak91 commented Jul 14, 2020

tholor commented Jul 14, 2020

Krak91 commented Jul 14, 2020

salbatarni commented Aug 17, 2023

How to get embeddings from a dense retriever #226

How to get embeddings from a dense retriever #226

Comments

Krak91 commented Jul 14, 2020

tholor commented Jul 14, 2020

Krak91 commented Jul 14, 2020

tholor commented Jul 14, 2020

Krak91 commented Jul 14, 2020

salbatarni commented Aug 17, 2023