-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/generate request possibly hanging when CUDA out of memory
is thrown
#435
Comments
@Gintasz Try to decrease the |
@hnyls2002 here I was using 0.1.14 version. I know It should not hang, it should error out with some 500 status code, so that Admittedly, I've not tested if server request actually hangs, however, this is my assumption based on no failure exception thrown on client-side regarding the out of memory generations. |
I can get it stable by turning off the radix trie cache with |
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed. |
I've
run_batch
with 1000 items,num_threads=200
. I notice that the batch processing gets stuck at 98%, then the server shows no more console logs. I checked the full log and I see someCUDA out of memory
errors.Therefore, I suspect that if this error is thrown, the
/generate
request might be left hanging. I've added retry tohttp_request
(check my pull request) and it still gets stuck. So this is why I suspect such requests may be hanging instead of failing, because if they were to fail, retry mechanism would have kicked in.Full server log here: log.txt
The text was updated successfully, but these errors were encountered: