-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dense Passage Retrieval (ConnectionTimeoutError) #644
Comments
First I get |
Please try to increase the timeout for elasticsearch via:
Are you running this on Colab? Elasticsearch might be very slow there as Colab only provides a single CPU core... |
I see. I will try that. Although, my Runtime type is Also, do you have any response to why I get this |
Elasticsearch can not benefit from GPU and will always run just on CPU.
Can you please provide more context / a script to reproduce this error? My first intuition would be that elasticsearch is probably still starting / or still busy with indexing the added documents / embeddings. |
this is the line causing issues This is the error I get when I run the above line for the firstime after starting and conneting to my server Then after I get this error, I simply run the cell again, then I get this I did as you directed |
There is possibility that any one of following may cause it (elastic/elasticsearch#8084) -
Could you please share end to end logs to debug more. |
Logs: This is all I see (from colab): `timeout Traceback (most recent call last) 21 frames During handling of the above exception, another exception occurred: ReadTimeoutError Traceback (most recent call last) During handling of the above exception, another exception occurred: ConnectionTimeout Traceback (most recent call last) ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=300))` |
@Freemanlabs Thank you for update.
Also in some code flow you timeout is still |
Also be aware that Google colab have disk limitation of 108GB out of it only 75GB is available to user. |
Expanded frames: `/usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py in raise_from(value, from_value) /usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) /usr/lib/python3.6/http/client.py in getresponse(self) /usr/lib/python3.6/http/client.py in begin(self) /usr/lib/python3.6/http/client.py in _read_status(self) /usr/lib/python3.6/socket.py in readinto(self, b) timeout: timed out During handling of the above exception, another exception occurred: ReadTimeoutError Traceback (most recent call last) /usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) /usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace) /usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py in reraise(tp, value, tb) /usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) /usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) /usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in _raise_timeout(self, err, url, timeout_value) ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=300) During handling of the above exception, another exception occurred: ConnectionTimeout Traceback (most recent call last) /usr/local/lib/python3.6/dist-packages/haystack/finder.py in get_answers(self, question, top_k_reader, top_k_retriever, filters, index) /usr/local/lib/python3.6/dist-packages/haystack/retriever/dense.py in retrieve(self, query, filters, top_k, index) /usr/local/lib/python3.6/dist-packages/haystack/document_store/elasticsearch.py in query_by_embedding(self, query_emb, filters, top_k, index, return_embedding) /usr/local/lib/python3.6/dist-packages/elasticsearch/client/utils.py in _wrapped(*args, **kwargs) /usr/local/lib/python3.6/dist-packages/elasticsearch/client/init.py in search(self, body, index, doc_type, params, headers) /usr/local/lib/python3.6/dist-packages/elasticsearch/transport.py in perform_request(self, method, url, headers, params, body) /usr/local/lib/python3.6/dist-packages/elasticsearch/transport.py in perform_request(self, method, url, headers, params, body) /usr/local/lib/python3.6/dist-packages/elasticsearch/connection/http_urllib3.py in perform_request(self, method, url, params, body, timeout, ignore, headers) ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=300))`
I don't understand why either.
|
Another question I have is why is DPR giving issues? BM25 and TFIDF works okay |
Well BM25 and TFIDF do not use script_score query. But DPR needs Can you manually increase timeout at this place https://github.com/deepset-ai/haystack/blob/master/haystack/document_store/elasticsearch.py#L574 and use that code to test. If timeout increase also do not solve you issue then you can try DPR with FAISS document store. @tholor I think we need to add timeout customisation at following place as well instead of hardcoding. |
Thank you for your response...
I do not know how to locate this file to change manually. Please assist |
I am trying this FAISSDocumentStore. If I do
when I inspect the retriver like so : All I see is an empty array
Too many frustration as a first time user of Haysack I must say. My supervisor feels I am not doing anything, because what he should be getting is results not issues. |
Sorry, to hear that you have trouble. From what I see, the main issues were around your usage on Colab (Mounting problem, connection time out ...). We will give our best to simplify the experience there, but as mentioned above, Colab has some severe resource limitations when running heavy external services like Elasticsearch or FAISS.
Did you call You can also jump on a quick call with one of our engineers if you need further help or share your colab notebook here. |
Thanks for pointing out that resource. I was practically following the Haystack documentation. |
Great! Closing this now. Feel free to re-open if the problem comes up again... |
With the same settings for Elasticsearch, I can successfully retrieve prediction answers from BM25 and TFIDF, However, When I try with DPR, I get
ConnectionTimeoutError
How do I resolve this?
The text was updated successfully, but these errors were encountered: