Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: data parallel group is already initialized #549

Closed
David-Lee-1990 opened this issue Jul 23, 2023 · 9 comments
Closed

AssertionError: data parallel group is already initialized #549

David-Lee-1990 opened this issue Jul 23, 2023 · 9 comments

Comments

@David-Lee-1990
Copy link

David-Lee-1990 commented Jul 23, 2023

Hi, when doing inference on a single-gpu, i encountered this assertion error.

It happens when runing at vllm/model_executor/parallel_utils/parallel_state.py. I do not know why vllm need to init_distributed_environment while i only use single-gpu.

@David-Lee-1990
Copy link
Author

Solved !

I initialized two LLM models for different service.

@SparshRastogi
Copy link

SparshRastogi commented Aug 15, 2023

Not working that way in google colab? Which platform you used for coding?

@mzeidhassan
Copy link

I am having the same issue. I work in a virtual environment (venv) in WSL2. How can I resolve this issue?

@YuamLu
Copy link

YuamLu commented Aug 21, 2023

@mzeidhassan
I've just got the same problem. Go to source code and commenting out all line which has "is already initialized".
It finally works !😊

@NandaKishoreJoshi
Copy link

@YuamLu , I'm also getting the same error while using vLLm on langchain. I'm running on Azure Ubuntu Virtual Machine

from langchain.llms import VLLM
llm = VLLM(model="meta-llama/Llama-2-70b-chat-hf",
trust_remote_code=True, # mandatory for hf models
max_new_tokens=128,
top_k=10,
top_p=0.95,
temperature=0.8,
use_auth_token=True
)

Error :

ValidationError: 1 validation error for VLLM
root
data parallel group is already initialized (type=assertion_error)

Please let me know how you solved in more detail

@YuamLu
Copy link

YuamLu commented Aug 26, 2023

@NandaKishoreJoshi
Hi,

I've updated a pull #817 , you can try on my code.

If you still have error, paste the full error message to here, i'll try my best to solve it.

@NandaKishoreJoshi
Copy link

@YuamLu ,
I tried your pull code and its working. Thank you

@saattrupdan
Copy link

I found a fix in my own use case, which does not involve changing the source code:

from vllm import LLM
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Initialise a vLLM model for the first time
model = LLM(model="test-model-name", trust_remote_code=True)

# This vLLM function resets the global variables, which enables initialising models
destroy_model_parallel()

# Re-initialise a new vLLM model
model = LLM(model="test-model-name", trust_remote_code=True)

Hope that helps others 🙂

@Anindyadeep
Copy link

I found a fix in my own use case, which does not involve changing the source code:

from vllm import LLM
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Initialise a vLLM model for the first time
model = LLM(model="test-model-name", trust_remote_code=True)

# This vLLM function resets the global variables, which enables initialising models
destroy_model_parallel()

# Re-initialise a new vLLM model
model = LLM(model="test-model-name", trust_remote_code=True)

Hope that helps others 🙂

Thanks for the quick solution. Really helpful, I additionally got a CUDA OOM Error (classic). So I thought to add an extended solution. Hope it helps too.

# Add the same code shown by @saattrupdan 

import torch
from vllm import LLM
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Initialise a vLLM model for the first time
model = LLM(model="test-model-name", trust_remote_code=True)

# This vLLM function resets the global variables, which enables initializing models
destroy_model_parallel()

# If you face CUDA OOM Error, then delete all the left over queued operations

del model
torch.cuda.synchronize() 

# Now, Re-initialise a new vLLM model
model = LLM(model="test-model-name", trust_remote_code=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants