Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash in VLLM #19

Open
apkuhar opened this issue Jul 8, 2023 · 4 comments
Open

crash in VLLM #19

apkuhar opened this issue Jul 8, 2023 · 4 comments

Comments

@apkuhar
Copy link

apkuhar commented Jul 8, 2023

Trying to install it to NVidia's pytorch contaner. I'm getting this while running.
Same issue while trying to install it to Lambda GPU cloud on H100 instance. (all default)

root@0971a018b7ec:/workspace/openchat# python -m ochat.serving.openai_api_server --model_type openchat_v2 --model openchat/openchat_v2_w --engine-use-ray --worker-use-ray
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/openchat/ochat/serving/openai_api_server.py", line 21, in <module>
    from vllm.engine.arg_utils import AsyncEngineArgs
  File "/usr/local/lib/python3.10/dist-packages/vllm/__init__.py", line 4, in <module>
    from vllm.engine.async_llm_engine import AsyncLLMEngine
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 7, in <module>
    from vllm.engine.llm_engine import LLMEngine
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 16, in <module>
    from vllm.worker.worker import Worker
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 8, in <module>
    from vllm.model_executor import get_model, InputMetadata, set_random_seed
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/__init__.py", line 2, in <module>
    from vllm.model_executor.model_loader import get_model
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 9, in <module>
    from vllm.model_executor.models import *  # pylint: disable=wildcard-import
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/__init__.py", line 1, in <module>
    from vllm.model_executor.models.bloom import BloomForCausalLM
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/bloom.py", line 31, in <module>
    from vllm.model_executor.layers.activation import get_act_fn
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/activation.py", line 5, in <module>
    from vllm import activation_ops
ImportError: /usr/local/lib/python3.10/dist-packages/vllm/activation_ops.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
@Valdanitooooo
Copy link

I get another error:

2023-07-18 15:53:21,350	ERROR services.py:1207 -- Failed to start the dashboard , return code 1
2023-07-18 15:53:21,350	ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is.
2023-07-18 15:53:21,350	ERROR services.py:1276 -- 
The last 20 lines of /tmp/ray/session_2023-07-18_15-53-19_820841_46100/logs/dashboard.log (it contains the error message from the dashboard): 
  File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/dashboard/modules/log/log_manager.py", line 8, in <module>
    from ray.util.state.common import (
  File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/util/state/__init__.py", line 1, in <module>
    from ray.util.state.api import (
  File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/util/state/api.py", line 17, in <module>
    from ray.util.state.common import (
  File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/util/state/common.py", line 120, in <module>
    @dataclass(init=True)
  File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/pydantic/dataclasses.py", line 139, in dataclass
    assert init is False, 'pydantic.dataclasses.dataclass only supports init=False'
AssertionError: pydantic.dataclasses.dataclass only supports init=False
2023-07-18 15:53:21,467	INFO worker.py:1636 -- Started a local Ray instance.
[2023-07-18 15:53:22,307 E 46100 46100] core_worker.cc:193: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory

@imoneoi
Copy link
Owner

imoneoi commented Aug 4, 2023

Have you fixed the issue? We released a new version and tested the following setup:

conda create -y --name openchat
conda activate openchat

conda install -y python
conda install -y cudatoolkit-dev -c conda-forge
pip3 install torch torchvision torchaudio

pip3 install packaging ninja
pip3 install --no-build-isolation "flash-attn<2"

pip3 install ochat

@butterluo
Copy link

I get another error:

2023-07-18 15:53:21,350	ERROR services.py:1207 -- Failed to start the dashboard , return code 1
2023-07-18 15:53:21,350	ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is.
2023-07-18 15:53:21,350	ERROR services.py:1276 -- 
The last 20 lines of /tmp/ray/session_2023-07-18_15-53-19_820841_46100/logs/dashboard.log (it contains the error message from the dashboard): 
  File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/dashboard/modules/log/log_manager.py", line 8, in <module>
    from ray.util.state.common import (
  File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/util/state/__init__.py", line 1, in <module>
    from ray.util.state.api import (
  File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/util/state/api.py", line 17, in <module>
    from ray.util.state.common import (
  File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/ray/util/state/common.py", line 120, in <module>
    @dataclass(init=True)
  File "/gai_data/anaconda3/envs/fastchat/lib/python3.10/site-packages/pydantic/dataclasses.py", line 139, in dataclass
    assert init is False, 'pydantic.dataclasses.dataclass only supports init=False'
AssertionError: pydantic.dataclasses.dataclass only supports init=False
2023-07-18 15:53:21,467	INFO worker.py:1636 -- Started a local Ray instance.
[2023-07-18 15:53:22,307 E 46100 46100] core_worker.cc:193: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory

Any update for this issue? I met this problem too.

@timothylimyl
Copy link

@imoneoi given that OpenChat adds a special token, the same changes has to be made in vllm right? I believe vllm uses the default hugging face model and tokenizer.

Did OpenChat integrate with vllm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants