Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dense Passage Retrieval (ConnectionTimeoutError) #644

Closed
Freemanlabs opened this issue Dec 1, 2020 · 17 comments
Closed

Dense Passage Retrieval (ConnectionTimeoutError) #644

Freemanlabs opened this issue Dec 1, 2020 · 17 comments
Labels
type:bug Something isn't working

Comments

@Freemanlabs
Copy link

With the same settings for Elasticsearch, I can successfully retrieve prediction answers from BM25 and TFIDF, However, When I try with DPR, I get ConnectionTimeoutError

How do I resolve this?

@Freemanlabs Freemanlabs added the type:bug Something isn't working label Dec 1, 2020
@Freemanlabs
Copy link
Author

First I get RequestError: 400... if I try it again, then I see ConnectionTimeoutError

@tholor
Copy link
Member

tholor commented Dec 2, 2020

Please try to increase the timeout for elasticsearch via:

es_store = ElasticsearchDocumentStore(..., timeout=3000)

Are you running this on Colab? Elasticsearch might be very slow there as Colab only provides a single CPU core...

@Freemanlabs
Copy link
Author

I see. I will try that. Although, my Runtime type is GPU. meaning I am using a GPU on Colab...

Also, do you have any response to why I get this RequestError: 400... initially?

@tholor
Copy link
Member

tholor commented Dec 2, 2020

Although, my Runtime type is GPU

Elasticsearch can not benefit from GPU and will always run just on CPU.

do you have any response to why I get this RequestError: 400...

Can you please provide more context / a script to reproduce this error? My first intuition would be that elasticsearch is probably still starting / or still busy with indexing the added documents / embeddings.

@Freemanlabs
Copy link
Author

this is the line causing issues
prediction = finder.get_answers(question=que["question"], top_k_retriever=10, top_k_reader=5)

This is the error I get when I run the above line for the firstime after starting and conneting to my server
RequestError: RequestError(400, 'search_phase_execution_exception', 'runtime error')

Then after I get this error, I simply run the cell again, then I get this
ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=300))

I did as you directed timeout=3000. still no luck

@lalitpagaria
Copy link
Contributor

There is possibility that any one of following may cause it (elastic/elasticsearch#8084) -

  • Low timeout
  • Low disk space
  • Two fields with the same name
  • Querying with null in filter (This is unlikely as call to get_answers dont have any filter parameter)

Could you please share end to end logs to debug more.

@Freemanlabs
Copy link
Author

  • My timeout was increased from 30 (default) to 3000.
  • I currently have 37/68 showing on my colab environment
  • Two fields with the same name: I don't know how to debug this. I don't even understand how it applies to my problem

Logs: This is all I see (from colab):

`timeout Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
420 # Otherwise it looks like a bug in the code.
--> 421 six.raise_from(e, None)
422 except (SocketTimeout, BaseSSLError, SocketError) as e:

21 frames
timeout: timed out

During handling of the above exception, another exception occurred:

ReadTimeoutError Traceback (most recent call last)
ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=300)

During handling of the above exception, another exception occurred:

ConnectionTimeout Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/elasticsearch/connection/http_urllib3.py in perform_request(self, method, url, params, body, timeout, ignore, headers)
255 raise SSLError("N/A", str(e), e)
256 if isinstance(e, ReadTimeoutError):
--> 257 raise ConnectionTimeout("TIMEOUT", str(e), e)
258 raise ConnectionError("N/A", str(e), e)
259

ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=300))`

@lalitpagaria
Copy link
Contributor

@Freemanlabs Thank you for update.
Could you please expend these 21 frames and share full stack-trace.

ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=300))`

Also in some code flow you timeout is still 300 not 3000.

@lalitpagaria
Copy link
Contributor

Also be aware that Google colab have disk limitation of 108GB out of it only 75GB is available to user.
https://neptune.ai/blog/google-colab-dealing-with-files#:~:text=Also%2C%20Colab%20has%20a%20disk,like%20image%20or%20video%20data.

@Freemanlabs
Copy link
Author

Could you please expend these 21 frames and share full stack-trace.

Expanded frames:

`/usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py in raise_from(value, from_value)

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
415 try:
--> 416 httplib_response = conn.getresponse()
417 except BaseException as e:

/usr/lib/python3.6/http/client.py in getresponse(self)
1372 try:
-> 1373 response.begin()
1374 except ConnectionError:

/usr/lib/python3.6/http/client.py in begin(self)
310 while True:
--> 311 version, status, reason = self._read_status()
312 if status != CONTINUE:

/usr/lib/python3.6/http/client.py in _read_status(self)
271 def _read_status(self):
--> 272 line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
273 if len(line) > _MAXLINE:

/usr/lib/python3.6/socket.py in readinto(self, b)
585 try:
--> 586 return self._sock.recv_into(b)
587 except timeout:

timeout: timed out

During handling of the above exception, another exception occurred:

ReadTimeoutError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/elasticsearch/connection/http_urllib3.py in perform_request(self, method, url, params, body, timeout, ignore, headers)
245 response = self.pool.urlopen(
--> 246 method, url, body, retries=Retry(False), headers=request_headers, **kw
247 )

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
719 retries = retries.increment(
--> 720 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
721 )

/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
375 # Disabled, indicate to re-raise the error.
--> 376 raise six.reraise(type(error), error, _stacktrace)
377

/usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py in reraise(tp, value, tb)
734 raise value.with_traceback(tb)
--> 735 raise value
736 finally:

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
671 headers=headers,
--> 672 chunked=chunked,
673 )

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
422 except (SocketTimeout, BaseSSLError, SocketError) as e:
--> 423 self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
424 raise

/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in _raise_timeout(self, err, url, timeout_value)
330 raise ReadTimeoutError(
--> 331 self, url, "Read timed out. (read timeout=%s)" % timeout_value
332 )

ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=300)

During handling of the above exception, another exception occurred:

ConnectionTimeout Traceback (most recent call last)
in ()
----> 1 prediction = dpr_finder.get_answers(question="who is who?", top_k_retriever=10, top_k_reader=5)

/usr/local/lib/python3.6/dist-packages/haystack/finder.py in get_answers(self, question, top_k_reader, top_k_retriever, filters, index)
67
68 # 1) Apply retriever(with optional filters) to get fast candidate documents
---> 69 documents = self.retriever.retrieve(question, filters=filters, top_k=top_k_retriever, index=index)
70 logger.info(f"Got {len(documents)} candidates from retriever")
71 logger.debug(f"Retrieved document IDs: {[doc.id for doc in documents]}")

/usr/local/lib/python3.6/dist-packages/haystack/retriever/dense.py in retrieve(self, query, filters, top_k, index)
140 index = self.document_store.index
141 query_emb = self.embed_queries(texts=[query])
--> 142 documents = self.document_store.query_by_embedding(query_emb=query_emb[0], top_k=top_k, filters=filters, index=index)
143 return documents
144

/usr/local/lib/python3.6/dist-packages/haystack/document_store/elasticsearch.py in query_by_embedding(self, query_emb, filters, top_k, index, return_embedding)
572
573 logger.debug(f"Retriever query: {body}")
--> 574 result = self.client.search(index=index, body=body, request_timeout=300)["hits"]["hits"]
575
576 documents = [self._convert_es_hit_to_document(hit, adapt_score_for_embedding=True) for hit in result]

/usr/local/lib/python3.6/dist-packages/elasticsearch/client/utils.py in _wrapped(*args, **kwargs)
150 if p in kwargs:
151 params[p] = kwargs.pop(p)
--> 152 return func(*args, params=params, headers=headers, **kwargs)
153
154 return _wrapped

/usr/local/lib/python3.6/dist-packages/elasticsearch/client/init.py in search(self, body, index, doc_type, params, headers)
1661 params=params,
1662 headers=headers,
-> 1663 body=body,
1664 )
1665

/usr/local/lib/python3.6/dist-packages/elasticsearch/transport.py in perform_request(self, method, url, headers, params, body)
390 raise e
391 else:
--> 392 raise e
393
394 else:

/usr/local/lib/python3.6/dist-packages/elasticsearch/transport.py in perform_request(self, method, url, headers, params, body)
363 headers=headers,
364 ignore=ignore,
--> 365 timeout=timeout,
366 )
367

/usr/local/lib/python3.6/dist-packages/elasticsearch/connection/http_urllib3.py in perform_request(self, method, url, params, body, timeout, ignore, headers)
255 raise SSLError("N/A", str(e), e)
256 if isinstance(e, ReadTimeoutError):
--> 257 raise ConnectionTimeout("TIMEOUT", str(e), e)
258 raise ConnectionError("N/A", str(e), e)
259

ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=300))`

Also in some code flow you timeout is still 300 not 3000

I don't understand why either.
From the documentation you see that timeout=30 so I do not know what 300 means

__init__(host: str = "localhost", port: int = 9200, username: str = "", password: str = "", index: str = "document", label_index: str = "label", search_fields: Union[str, list] = "text", text_field: str = "text", name_field: str = "name", embedding_field: str = "embedding", embedding_dim: int = 768, custom_mapping: Optional[dict] = None, excluded_meta_data: Optional[list] = None, faq_question_field: Optional[str] = None, analyzer: str = "standard", scheme: str = "http", ca_certs: bool = False, verify_certs: bool = True, create_index: bool = True, update_existing_documents: bool = False, refresh_type: str = "wait_for", similarity="dot_product", timeout=30, return_embedding: Optional[bool] = True)

@Freemanlabs
Copy link
Author

Another question I have is why is DPR giving issues? BM25 and TFIDF works okay

@lalitpagaria
Copy link
Contributor

Well BM25 and TFIDF do not use script_score query. But DPR needs query_by_embedding hence it use the similarity function to compare document vectors.

Can you manually increase timeout at this place https://github.com/deepset-ai/haystack/blob/master/haystack/document_store/elasticsearch.py#L574 and use that code to test.

If timeout increase also do not solve you issue then you can try DPR with FAISS document store.
Otherwise try to use custom plugin, but in order to use it you would need to make few change in the elasticsearch.py in haystack.

@tholor I think we need to add timeout customisation at following place as well instead of hardcoding.
https://github.com/deepset-ai/haystack/blob/master/haystack/document_store/elasticsearch.py#L574

@Freemanlabs
Copy link
Author

Thank you for your response...

Can you manually increase timeout at this place https://github.com/deepset-ai/haystack/blob/master/haystack/document_store/elasticsearch.py#L574 and use that code to test.

I do not know how to locate this file to change manually. Please assist

@Freemanlabs
Copy link
Author

If timeout increase also do not solve you issue then you can try DPR with FAISS document store.

I am trying this FAISSDocumentStore. If I do faiss_document_store.get_document_count() I see 874 (which is the total number of documents I have). But when I pass it to the retriever like so dpr_retriever = DensePassageRetriever(document_store=faiss_document_store) and try to get answers from finder, I get this INFO

12/03/2020 08:26:05 - INFO - haystack.finder - Got 0 candidates from retriever 12/03/2020 08:26:05 - INFO - haystack.finder - Retriever did not return any documents. Skipping reader ...

when I inspect the retriver like so :
dpr_retriever.retrieve(query="I would expect the remaining TFC protection to remain protected in both")

All I see is an empty array

Creating Embeddings: 100%|██████████| 1/1 [00:00<00:00, 4.84 Batches/s] []

Too many frustration as a first time user of Haysack I must say. My supervisor feels I am not doing anything, because what he should be getting is results not issues.

@tholor
Copy link
Member

tholor commented Dec 3, 2020

Sorry, to hear that you have trouble. From what I see, the main issues were around your usage on Colab (Mounting problem, connection time out ...). We will give our best to simplify the experience there, but as mentioned above, Colab has some severe resource limitations when running heavy external services like Elasticsearch or FAISS.

All I see is an empty array

Did you call document_store.update_embeddings(retriever) as described in this tutorial?

You can also jump on a quick call with one of our engineers if you need further help or share your colab notebook here.

@Freemanlabs
Copy link
Author

Thanks for pointing out that resource. I was practically following the Haystack documentation.
All seems well now

@tholor
Copy link
Member

tholor commented Dec 7, 2020

Great! Closing this now. Feel free to re-open if the problem comes up again...

@tholor tholor closed this as completed Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants