-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using parallelize=True
, raise Runtime Error: expected all tensors to be on the same device
#1575
Comments
Hi I think I am experiencing tha same error using My command is
I'm trying to do Model-parallel to load a single copy of the model which is too big to load in single GPU. My GPU environment was 4*
And this error occurred when running loglikelihood request. |
usually python reports more lines before error. reporting more lines is more helpful. By the way, python 3.12.x may produce even more friendly errors in terms of finding the place of last call from package. |
Hi, the full lines are as follows:
|
Hi @feiba54 , have you tried the most recent version of the codebase from |
yes, it's the |
Could you try to rerun with a model that is public, and also see what happens if you remove |
Hi I tried with and without Here are some additional information when using
|
Thanks! Can you clarify the commands you are running again, and also what transformers version is being used? I did not see |
I have the same error when using
|
@feiba54 Confirming I've been able to reproduce this on my end now (with Pythia-70m). Investigating possible fixes! |
A temporary workaround is, if the error is
to pass |
Hi, I just met the same problem. And here is the issuse related to |
As of now setting higher cuda works with |
I think there are a few issues being conflated here and it would be helpful to disentangle them:
We support:
accelerate launch
, which is only meant to support Data-parallel inference (no FSDP, no splitting a model across multiple GPUs).accelerate launch
but with--model_args parallelize=True
, which is meant to enable loading a single copy of the model, split across all GPUs you have available.For all the usecases you are experiencing, the latter option should be what is used. However, my understanding is that when trying to use
parallelize=True
, the result isRuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
I'm struggling to reproduce this error however (on the most recent version of the codebase), I will continue to see if I can find a way to replicate it.
Originally posted by @haileyschoelkopf in #1220 (comment)
The text was updated successfully, but these errors were encountered: