Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Dockerfile #1109

Closed
wants to merge 1 commit into from
Closed

Update Dockerfile #1109

wants to merge 1 commit into from

Conversation

segyges
Copy link
Contributor

@segyges segyges commented Jan 4, 2024

Switches cuda image back from base to devel.

Resolves #1021

I was also experiencing this bug and needed a fix. Following the readme for building out of docker leads directly to encountering this problem. Haven't tested the resulting image beyond confirming that it builds, if there's some conceivable way this could have broken something downstream that should really be checked for before merging.

System info:

cat /etc/*-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
sudo apt list --installed | grep nvidia

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libnvidia-cfg1-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-common-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 all [installed,automatic]
libnvidia-compute-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-container-tools/unknown,now 1.15.0~rc.1-1 amd64 [installed,automatic]
libnvidia-container1/unknown,now 1.15.0~rc.1-1 amd64 [installed,automatic]
libnvidia-decode-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-egl-wayland1/jammy,now 1:1.1.9-1.1 amd64 [installed,auto-removable]
libnvidia-encode-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-extra-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-fbc1-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-gl-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
linux-modules-nvidia-535-5.15.0-91-generic/jammy-updates,jammy-security,now 5.15.0-91.101 amd64 [installed,automatic]
linux-modules-nvidia-535-generic/jammy-updates,jammy-security,now 5.15.0-91.101 amd64 [installed]
linux-objects-nvidia-535-5.15.0-91-generic/jammy-updates,jammy-security,now 5.15.0-91.101 amd64 [installed,automatic]
linux-signatures-nvidia-5.15.0-91-generic/jammy-updates,jammy-security,now 5.15.0-91.101 amd64 [installed,automatic]
nvidia-compute-utils-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-container-toolkit-base/unknown,now 1.15.0~rc.1-1 amd64 [installed,automatic]
nvidia-container-toolkit/unknown,now 1.15.0~rc.1-1 amd64 [installed]
nvidia-driver-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed]
nvidia-firmware-535-535.129.03/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-kernel-common-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-kernel-source-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-prime/jammy,now 0.8.17.1 all [installed,automatic]
nvidia-settings/jammy,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
nvidia-utils-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
xserver-xorg-video-nvidia-535/jammy-updates,jammy-security,now 535.129.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-smi
Thu Jan  4 18:22:10 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:04:00.0 Off |                  N/A |
| 33%   24C    P8              24W / 420W |      3MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off | 00000000:2B:00.0 Off |                  N/A |
| 36%   28C    P8              29W / 420W |      3MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
cat /etc/docker/daemon.json
{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    },
    "default-runtime": "nvidia"
}

Output with old version:

docker build -t gpt-neox -f Dockerfile .
[+] Building 472.4s (18/23)                                                                                                                                                                       docker:default
 => [internal] load build definition from Dockerfile                                                                                                                                                        0.0s
 => => transferring dockerfile: 5.29kB                                                                                                                                                                      0.0s
 => [internal] load .dockerignore                                                                                                                                                                           0.0s
 => => transferring context: 57B                                                                                                                                                                            0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:12.2.2-runtime-ubuntu20.04                                                                                                                           1.1s
 => [internal] load build context                                                                                                                                                                           0.0s
 => => transferring context: 3.67kB                                                                                                                                                                         0.0s
 => [ 1/19] FROM docker.io/nvidia/cuda:12.2.2-runtime-ubuntu20.04@sha256:7df325b76ef5087ac512a6128e366b7043ad8db6388c19f81944a28cd4157368                                                                  64.7s
 => => resolve docker.io/nvidia/cuda:12.2.2-runtime-ubuntu20.04@sha256:7df325b76ef5087ac512a6128e366b7043ad8db6388c19f81944a28cd4157368                                                                     0.0s
 => => sha256:e4f230263527ce207b7455b9476309d18a9f77f74e1f4b1fccda5852f531cd33 183B / 183B                                                                                                                  0.2s
 => => sha256:7df325b76ef5087ac512a6128e366b7043ad8db6388c19f81944a28cd4157368 743B / 743B                                                                                                                  0.0s
 => => sha256:9c3dc574865e6cb93272aa7d10122a81d29aa6ae6c4dbb2ebdbb6efb4e1e4a60 2.21kB / 2.21kB                                                                                                              0.0s
 => => sha256:7c06c0f45a757dbbf4a80163b07232593fbe76f2d12be77f266245106152b3b8 12.90kB / 12.90kB                                                                                                            0.0s
 => => sha256:db26cf78ae4f895b1162fb506e79b7257fb2e39538a586d6634fe20f48cc60a5 7.94MB / 7.94MB                                                                                                              0.7s
 => => sha256:5adc7ab504d3aa2d75a0a9c265b66b194ddd891b0b311637307d7810a986c580 56.08MB / 56.08MB                                                                                                           11.5s
 => => sha256:95e3f492d47e010cc39b4aed8cd21d90bf77d820b0ab8f9785ca3e45d96fc074 6.88kB / 6.88kB                                                                                                              0.4s
 => => sha256:35dd1979297e8aea372ebc3f342857aea7be0b5a01595d2dab7c0c2165ae30c8 1.27GB / 1.27GB                                                                                                             57.5s
 => => extracting sha256:db26cf78ae4f895b1162fb506e79b7257fb2e39538a586d6634fe20f48cc60a5                                                                                                                   0.1s
 => => sha256:39a2c88664b34d9fdc8d242048da75c8f639ef082c09606d67821e6ed34a5c4d 62.44kB / 62.44kB                                                                                                            0.9s
 => => sha256:d8f6b6cd09da3d00868412345089a2c2d5052ac966e64be8f2f9bf2725d202f8 1.68kB / 1.68kB                                                                                                              1.1s
 => => sha256:fe19bbed4a4aba883058d2b9f0d88541f6df0e621595a8acfe57d3d5768bb535 1.52kB / 1.52kB                                                                                                              1.2s
 => => extracting sha256:5adc7ab504d3aa2d75a0a9c265b66b194ddd891b0b311637307d7810a986c580                                                                                                                   0.5s
 => => extracting sha256:e4f230263527ce207b7455b9476309d18a9f77f74e1f4b1fccda5852f531cd33                                                                                                                   0.0s
 => => extracting sha256:95e3f492d47e010cc39b4aed8cd21d90bf77d820b0ab8f9785ca3e45d96fc074                                                                                                                   0.0s
 => => extracting sha256:35dd1979297e8aea372ebc3f342857aea7be0b5a01595d2dab7c0c2165ae30c8                                                                                                                   7.1s
 => => extracting sha256:39a2c88664b34d9fdc8d242048da75c8f639ef082c09606d67821e6ed34a5c4d                                                                                                                   0.0s
 => => extracting sha256:d8f6b6cd09da3d00868412345089a2c2d5052ac966e64be8f2f9bf2725d202f8                                                                                                                   0.0s
 => => extracting sha256:fe19bbed4a4aba883058d2b9f0d88541f6df0e621595a8acfe57d3d5768bb535                                                                                                                   0.0s
 => [ 2/19] RUN apt-get update -y &&     apt-get install -y         git python3.9 python3-dev libpython3-dev python3-pip sudo pdsh         htop llvm-9-dev tmux zstd software-properties-common build-ess  83.9s
 => [ 3/19] RUN mkdir /var/run/sshd &&     sed -i 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' /etc/pam.d/sshd &&     echo 'AuthorizedKeysFile     .ssh/authorized_keys' >>  0.3s
 => [ 4/19] RUN mkdir -p /build &&     cd /build &&     wget -q -O - https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.0.tar.gz | tar xzf - &&     cd openmpi-4.1.0 &&     ./configure --p  179.9s
 => [ 5/19] RUN mv /usr/local/mpi/bin/mpirun /usr/local/mpi/bin/mpirun.real &&     echo '#!/bin/bash' > /usr/local/mpi/bin/mpirun &&     echo 'mpirun.real --allow-run-as-root --prefix /usr/local/mpi "$@  0.3s
 => [ 6/19] RUN useradd --create-home --uid 1000 --shell /bin/bash mchorse &&     usermod -aG sudo mchorse &&     echo "mchorse ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers                                    0.3s
 => [ 7/19] RUN mkdir -p /home/mchorse/.ssh /job &&     echo 'Host *' > /home/mchorse/.ssh/config &&     echo '    StrictHostKeyChecking no' >> /home/mchorse/.ssh/config &&     echo 'export PDSH_RCMD_TY  0.3s
 => [ 8/19] RUN pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117 && pip cache purge                                  122.8s
 => [ 9/19] COPY requirements/requirements.txt .                                                                                                                                                            0.0s
 => [10/19] COPY requirements/requirements-wandb.txt .                                                                                                                                                      0.0s
 => [11/19] COPY requirements/requirements-onebitadam.txt .                                                                                                                                                 0.0s
 => [12/19] COPY requirements/requirements-sparseattention.txt .                                                                                                                                            0.1s
 => [13/19] COPY requirements/requirements-flashattention.txt .                                                                                                                                             0.0s
 => ERROR [14/19] RUN pip install -r requirements.txt && pip install -r requirements-onebitadam.txt &&     pip install -r requirements-sparseattention.txt &&     pip install -r requirements-flashattent  18.7s
------
 > [14/19] RUN pip install -r requirements.txt && pip install -r requirements-onebitadam.txt &&     pip install -r requirements-sparseattention.txt &&     pip install -r requirements-flashattention.txt &&     pip install -r requirements-wandb.txt &&     pip install protobuf==3.20.* &&     pip cache purge:
0.453 Collecting deepspeed (from -r requirements.txt (line 2))
0.453   Cloning https://github.com/EleutherAI/DeeperSpeed.git (to revision b9260436e7da3e297fc6bedfd27d9e69fbba6f5c) to /tmp/pip-install-qskuk1l3/deepspeed_a20e7671cb0f4239ad4a80bbbb3e00c1
0.456   Running command git clone --filter=blob:none --quiet https://github.com/EleutherAI/DeeperSpeed.git /tmp/pip-install-qskuk1l3/deepspeed_a20e7671cb0f4239ad4a80bbbb3e00c1
14.06   Running command git rev-parse -q --verify 'sha^b9260436e7da3e297fc6bedfd27d9e69fbba6f5c'
14.07   Running command git fetch -q https://github.com/EleutherAI/DeeperSpeed.git b9260436e7da3e297fc6bedfd27d9e69fbba6f5c
16.26   Running command git checkout -q b9260436e7da3e297fc6bedfd27d9e69fbba6f5c
17.48   Resolved https://github.com/EleutherAI/DeeperSpeed.git to commit b9260436e7da3e297fc6bedfd27d9e69fbba6f5c
17.49   Preparing metadata (setup.py): started
18.54   Preparing metadata (setup.py): finished with status 'error'
18.55   error: subprocess-exited-with-error
18.55
18.55   × python setup.py egg_info did not run successfully.
18.55   │ exit code: 1
18.55   ╰─> [21 lines of output]
18.55       No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
18.55       Traceback (most recent call last):
18.55         File "<string>", line 2, in <module>
18.55         File "<pip-setuptools-caller>", line 34, in <module>
18.55         File "/tmp/pip-install-qskuk1l3/deepspeed_a20e7671cb0f4239ad4a80bbbb3e00c1/setup.py", line 119, in <module>
18.55           os.environ["TORCH_CUDA_ARCH_LIST"] = get_default_compute_capabilities()
18.55         File "/tmp/pip-install-qskuk1l3/deepspeed_a20e7671cb0f4239ad4a80bbbb3e00c1/op_builder/builder.py", line 55, in get_default_compute_capabilities
18.55           if torch.utils.cpp_extension.CUDA_HOME is not None and installed_cuda_version()[0] >= 11:
18.55         File "/tmp/pip-install-qskuk1l3/deepspeed_a20e7671cb0f4239ad4a80bbbb3e00c1/op_builder/builder.py", line 43, in installed_cuda_version
18.55           output = subprocess.check_output([cuda_home + "/bin/nvcc", "-V"], universal_newlines=True)
18.55         File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
18.55           return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
18.55         File "/usr/lib/python3.8/subprocess.py", line 493, in run
18.55           with Popen(*popenargs, **kwargs) as process:
18.55         File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
18.55           self._execute_child(args, executable, preexec_fn, close_fds,
18.55         File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
18.55           raise child_exception_type(errno_num, err_msg, err_filename)
18.55       FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'
18.55       Setting ds_accelerator to cuda (auto detect)
18.55       [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
18.55       [end of output]
18.55
18.55   note: This error originates from a subprocess, and is likely not a problem with pip.
18.55 error: metadata-generation-failed
18.55
18.55 × Encountered error while generating package metadata.
18.55 ╰─> See above for output.
18.55
18.55 note: This is an issue with the package mentioned above, not pip.
18.55 hint: See above for details.
------
Dockerfile:97
--------------------
  96 |     COPY requirements/requirements-flashattention.txt .
  97 | >>> RUN pip install -r requirements.txt && pip install -r requirements-onebitadam.txt && \
  98 | >>>     pip install -r requirements-sparseattention.txt && \
  99 | >>>     pip install -r requirements-flashattention.txt && \
 100 | >>>     pip install -r requirements-wandb.txt && \
 101 | >>>     pip install protobuf==3.20.* && \
 102 | >>>     pip cache purge
 103 |
--------------------
ERROR: failed to solve: process "/bin/sh -c pip install -r requirements.txt && pip install -r requirements-onebitadam.txt &&     pip install -r requirements-sparseattention.txt &&     pip install -r requirements-flashattention.txt &&     pip install -r requirements-wandb.txt &&     pip install protobuf==3.20.* &&     pip cache purge" did not complete successfully: exit code: 1

New version:

docker build -t gpt-neox -f Dockerfile .
[+] Building 989.0s (24/24) FINISHED                                                                                                                                                              docker:default
 => [internal] load .dockerignore                                                                                                                                                                           0.0s
 => => transferring context: 57B                                                                                                                                                                            0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                        0.0s
 => => transferring dockerfile: 5.29kB                                                                                                                                                                      0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:11.7.1-devel-ubuntu20.04                                                                                                                             1.0s
 => [internal] load build context                                                                                                                                                                           0.0s
 => => transferring context: 3.67kB                                                                                                                                                                         0.0s
 => [ 1/19] FROM docker.io/nvidia/cuda:11.7.1-devel-ubuntu20.04@sha256:47bad1799ade862fa2486b6e5b19ce91c29396f4ca8d74d83b6c228222b6079f                                                                    98.8s
 => => resolve docker.io/nvidia/cuda:11.7.1-devel-ubuntu20.04@sha256:47bad1799ade862fa2486b6e5b19ce91c29396f4ca8d74d83b6c228222b6079f                                                                       0.0s
 => => sha256:47bad1799ade862fa2486b6e5b19ce91c29396f4ca8d74d83b6c228222b6079f 743B / 743B                                                                                                                  0.0s
 => => sha256:fc997521e612899a01dce92820f5f5a201dd943ebfdc3e49ba0706d491a39d2d 2.63kB / 2.63kB                                                                                                              0.0s
 => => sha256:388fd6a5119ab1d668eeb86e77c6f5e964f734ec619e13c1494e15ca6986768f 1.92GB / 1.92GB                                                                                                             83.6s
 => => sha256:54bed995e06c6ef66e551eef1d8aab7910f8a211aaad53651fefbdb943f8b158 16.45kB / 16.45kB                                                                                                            0.0s
 => => sha256:0d808b7b701e1adf4114b7039f38a06b59138b1faa575e6eeeefca035b24a67a 85.88kB / 85.88kB                                                                                                            0.3s
 => => extracting sha256:388fd6a5119ab1d668eeb86e77c6f5e964f734ec619e13c1494e15ca6986768f                                                                                                                  15.1s
 => => extracting sha256:0d808b7b701e1adf4114b7039f38a06b59138b1faa575e6eeeefca035b24a67a                                                                                                                   0.0s
 => [ 2/19] RUN apt-get update -y &&     apt-get install -y         git python3.9 python3-dev libpython3-dev python3-pip sudo pdsh         htop llvm-9-dev tmux zstd software-properties-common build-ess  44.4s
 => [ 3/19] RUN mkdir /var/run/sshd &&     sed -i 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' /etc/pam.d/sshd &&     echo 'AuthorizedKeysFile     .ssh/authorized_keys' >>  0.2s
 => [ 4/19] RUN mkdir -p /build &&     cd /build &&     wget -q -O - https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.0.tar.gz | tar xzf - &&     cd openmpi-4.1.0 &&     ./configure --p  181.1s
 => [ 5/19] RUN mv /usr/local/mpi/bin/mpirun /usr/local/mpi/bin/mpirun.real &&     echo '#!/bin/bash' > /usr/local/mpi/bin/mpirun &&     echo 'mpirun.real --allow-run-as-root --prefix /usr/local/mpi "$@  0.3s
 => [ 6/19] RUN useradd --create-home --uid 1000 --shell /bin/bash mchorse &&     usermod -aG sudo mchorse &&     echo "mchorse ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers                                    0.4s
 => [ 7/19] RUN mkdir -p /home/mchorse/.ssh /job &&     echo 'Host *' > /home/mchorse/.ssh/config &&     echo '    StrictHostKeyChecking no' >> /home/mchorse/.ssh/config &&     echo 'export PDSH_RCMD_TY  0.3s
 => [ 8/19] RUN pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117 && pip cache purge                                  101.8s
 => [ 9/19] COPY requirements/requirements.txt .                                                                                                                                                            0.0s
 => [10/19] COPY requirements/requirements-wandb.txt .                                                                                                                                                      0.0s
 => [11/19] COPY requirements/requirements-onebitadam.txt .                                                                                                                                                 0.0s
 => [12/19] COPY requirements/requirements-sparseattention.txt .                                                                                                                                            0.0s
 => [13/19] COPY requirements/requirements-flashattention.txt .                                                                                                                                             0.0s
 => [14/19] RUN pip install -r requirements.txt && pip install -r requirements-onebitadam.txt &&     pip install -r requirements-sparseattention.txt &&     pip install -r requirements-flashattention.t  131.9s
 => [15/19] RUN pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" git+https://github.com/NVIDIA/apex.git@a651e2c24ecf97cbf367fd3f330df3  295.9s
 => [16/19] COPY megatron/ megatron                                                                                                                                                                         0.0s
 => [17/19] RUN python megatron/fused_kernels/setup.py install                                                                                                                                            120.0s
 => [18/19] RUN mkdir -p /tmp && chmod 0777 /tmp                                                                                                                                                            0.2s
 => [19/19] WORKDIR /home/mchorse                                                                                                                                                                           0.0s
 => exporting to image                                                                                                                                                                                     12.5s
 => => exporting layers                                                                                                                                                                                    12.5s
 => => writing image sha256:a1c3c8fff24834f1904127d95116fa152296bdc7ba2bb3adf94b6e389496d27d                                                                                                                0.0s
 => => naming to docker.io/library/gpt-neox

Switches cuda image back from base to devel
@segyges
Copy link
Contributor Author

segyges commented Jan 4, 2024

@yang appears to have beaten me to this by seven minutes. props

@Quentin-Anthony
Copy link
Member

@yang appears to have beaten me to this by seven minutes. props

Was about to say the same. I believe this was due to changes in DeepSpeed's build process. Thanks for creating this regardless! I'm going to go ahead and close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: nvcc does not exists in runtime version of nvidia base image used in Dockerfile
2 participants