SG-Lang Runtime Stuck Launching in Docker Container #527

schopra8 · 2024-06-11T02:14:04Z

We're trying to run the latest version of sg-lang in a Docker Container (PyTorch 2.3.0, CUDA 12.1) -- but the runtime instantiation gets stuck. It's start loading the model onto the GPU and then hangs.

We've been able to run sg-lang without any problems on the host operating system. So we pip froze the requirements on the host instance and installed these exact packages within the Docker Container -- but we're still hitting this model loading hang.

Has anyone seen this issue before? Any ideas what might be going wrong?

The text was updated successfully, but these errors were encountered:

schopra8 · 2024-06-11T03:59:12Z

I've found the line that causes the hang -- but I have no clue why this is a problem:

sglang/python/sglang/srt/managers/controller/model_runner.py

Line 242 in 542bc73

torch.cuda.set_device(self.gpu_id)

I'm running with dp=1 and tp=1 (i.e., on a single GPU). If I run torch.cuda.set_device(0) in my python script -- before I create the Runtime, everything works as expected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SG-Lang Runtime Stuck Launching in Docker Container #527

SG-Lang Runtime Stuck Launching in Docker Container #527

schopra8 commented Jun 11, 2024 •

edited

Loading

schopra8 commented Jun 11, 2024

SG-Lang Runtime Stuck Launching in Docker Container #527

SG-Lang Runtime Stuck Launching in Docker Container #527

Comments

schopra8 commented Jun 11, 2024 • edited Loading

schopra8 commented Jun 11, 2024

schopra8 commented Jun 11, 2024 •

edited

Loading