CUDA driver verision is insufficient for CUDA runtime version #1

sghilardi · 2020-03-10T19:53:38Z

Hello,
I am trying to set up the U-Net server backend using the simple docker method (startServer.sh). I can successfully create the docker container and connect to it using the U-Net Fiji plug-in, but during segmentation I get the following error:

Model check failed:
F0310 19:40:56.398861 55 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version

Does anyone have any insight into how to fix the driver/runtime error in Docker?
Here is the full Fiji log for the event: (note that after the error message, I closed the docker window which is why there are 'Host Unreachable' errors at the end.)


Establishing SSH connection for '[email protected]:22'
[email protected]$ mkdir ~/data1; mkdir ~/cellnet;
[email protected] $ mkdir "/home/unetuser/~"
[email protected] $ mkdir "/home/unetuser/~/cellnet"
$ sftp "/tmp/unet-12ffe596-e444-4995-b0a3-7a59477999d87163029522390179009.tmp" "[email protected]:22:/home/unetuser/~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d87163029522390179009.tmp"
[email protected] $ rm "/home/unetuser/~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d87163029522390179009.tmp"
[email protected]$ caffe_unet
Setting caffe_unet binary path to caffe_unet
Searching for caffe
[email protected]$ caffe
$ sftp "/home/sgrolab/Downloads/caffemodels/3d_cell_net_microspores-fluorescence_v0.modeldef.h5" "[email protected]:22:/home/unetuser/~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5"
[email protected]$ caffe_unet check_model_and_weights_h5 -model "~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5" -weights "~/data1" -n_channels 1 -gpu 0
F0310 19:40:44.387264    53 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
$ sftp "/home/sgrolab/Downloads/caffemodels/2d_cell_net_v0.caffemodel.h5" "[email protected]:22:/home/unetuser/~/data1"
[email protected]$ caffe_unet check_model_and_weights_h5 -model "~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5" -weights "~/data1" -n_channels 1 -gpu 0
Model check failed:
F0310 19:40:56.398861    55 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
F0310 19:40:56.398861    55 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
Establishing SSH connection for '[email protected]:22'
Could not remove temporary file ~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5: com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host (Host unreachable)
Establishing SSH connection for '[email protected]:22'
Could not remove temporary folder /home/unetuser/~/cellnet: com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host (Host unreachable)
Establishing SSH connection for '[email protected]:22'
Could not remove temporary folder /home/unetuser/~: com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host (Host unreachable)
Establishing SSH connection for '[email protected]:22'
Establishing SSH connection for '[email protected]:22'
U-Net job aborted

The text was updated successfully, but these errors were encountered:

ThorstenFalk · 2020-03-11T20:16:14Z

Which nVidia driver are you running on your docker host? Probably you simply have to update the graphics driver to make it work.

sghilardi · 2020-03-11T22:16:40Z

Hi Thorsten, I am running driver version 440.33.01, which was installed as part of the latest cuda installation available for Ubuntu 18.04.
Wed Mar 11 18:08:54 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro P400 Off | 00000000:65:00.0 On | N/A | | 34% 44C P0 N/A / N/A | 661MiB / 1984MiB | 1% Default | +-------------------------------+----------------------+----------------------+ | 1 Quadro RTX 4000 Off | 00000000:B3:00.0 Off | N/A | | 30% 29C P8 1W / 125W | 1MiB / 7982MiB | 0% Default | +-------------------------------+----------------------+----------------------+
I also modified the --gpus flag on the docker container so that it only accesses the RTX 4000, and the container does only see the GPU
`* Starting OpenBSD Secure Shell server sshd [ OK ]
unetuser@lmbunet:~$ nvidia-smi
Wed Mar 11 22:13:44 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 4000 Off | 00000000:B3:00.0 Off | N/A |
| 30% 29C P8 1W / 125W | 1MiB / 7982MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+`

ThorstenFalk · 2020-03-12T20:12:09Z

Are you running nvidia-docker or plain docker? With plain docker it won't work because the docker container must share the nvidia driver with the host system.

sghilardi · 2020-03-12T21:09:47Z

I am running Docker 19.03.7 with Nvidia-docker. This version did change the syntax slightly from --runtime=nvidia to --gpu <option>, so I did make the corresponding change to the startServer.sh file, which now reads: docker run -it --rm --gpus "device=1" --hostname "lmbunet" -p 2222:22 lmb-unet-server nvidia/cuda:10.0-base nvidia-smi .
I believe the containers are running nvidia-docker because when I run this script and then run the nvidia-smi command in the docker container , the container has access to the GPU : unetuser@lmbunet:~$ nvidia-smi
Thu Mar 12 21:04:57 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 4000 Off | 00000000:B3:00.0 Off | N/A |
| 30% 29C P8 1W / 125W | 1MiB / 7982MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|===========================================================================``==|
| No running processes found |
+-----------------------------------------------------------------------------+

If i remove the --gpu flag, the container still loads, but the nvidia-smi command doesn't return a gpu

unetuser@lmbunet:~$ nvidia-smi
bash: nvidia-smi: command not found

sheldonxxd · 2020-07-07T10:29:19Z

Hello,
I am trying to set up the U-Net server backend using the simple docker method (startServer.sh). I can successfully create the docker container and connect to it using the U-Net Fiji plug-in, but during segmentation I get the following error:

Model check failed:
F0310 19:40:56.398861 55 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version

Does anyone have any insight into how to fix the driver/runtime error in Docker?
Here is the full Fiji log for the event: (note that after the error message, I closed the docker window which is why there are 'Host Unreachable' errors at the end.)


Establishing SSH connection for '[email protected]:22'
[email protected]$ mkdir ~/data1; mkdir ~/cellnet;
[email protected] $ mkdir "/home/unetuser/~"
[email protected] $ mkdir "/home/unetuser/~/cellnet"
$ sftp "/tmp/unet-12ffe596-e444-4995-b0a3-7a59477999d87163029522390179009.tmp" "[email protected]:22:/home/unetuser/~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d87163029522390179009.tmp"
[email protected] $ rm "/home/unetuser/~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d87163029522390179009.tmp"
[email protected]$ caffe_unet
Setting caffe_unet binary path to caffe_unet
Searching for caffe
[email protected]$ caffe
$ sftp "/home/sgrolab/Downloads/caffemodels/3d_cell_net_microspores-fluorescence_v0.modeldef.h5" "[email protected]:22:/home/unetuser/~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5"
[email protected]$ caffe_unet check_model_and_weights_h5 -model "~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5" -weights "~/data1" -n_channels 1 -gpu 0
F0310 19:40:44.387264    53 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
$ sftp "/home/sgrolab/Downloads/caffemodels/2d_cell_net_v0.caffemodel.h5" "[email protected]:22:/home/unetuser/~/data1"
[email protected]$ caffe_unet check_model_and_weights_h5 -model "~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5" -weights "~/data1" -n_channels 1 -gpu 0
Model check failed:
F0310 19:40:56.398861    55 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
F0310 19:40:56.398861    55 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
Establishing SSH connection for '[email protected]:22'
Could not remove temporary file ~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5: com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host (Host unreachable)
Establishing SSH connection for '[email protected]:22'
Could not remove temporary folder /home/unetuser/~/cellnet: com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host (Host unreachable)
Establishing SSH connection for '[email protected]:22'
Could not remove temporary folder /home/unetuser/~: com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host (Host unreachable)
Establishing SSH connection for '[email protected]:22'
Establishing SSH connection for '[email protected]:22'
U-Net job aborted

This is an excellent tool for cell segmentation.

I installed and run the docker perfectly in last week (Ubuntu20.04+docker19.03.12+nvidia-docker, nvidia-driver:440.100, cuda:10.2, RTX2060Super). After the installation of nvidia-docker, I check it as below:

Then I downloaded the caffe-unet docker zip file to make. --no-check-certificate should be added into the line 23 in Dockerfile-bin file to avoid the error of wget function.

No other specific change.

I notice your port is "22", not "2222", which is different from the tutorial. But if it is the question, you can't establish connection (seems like you modify the startServer.sh).

Anyway, wish you solve the problem.

ThorstenFalk · 2020-07-07T20:42:04Z

I agree that in the logs you should see port 2222 if you run the docker container with
docker run --rm --gpus "device=1" --hostname "lmbunet" -p 2222:22 -it lmb-unet-server.

If you did not explicitly change the default ssh port of your host system, port 22 will connect to the host not to the docker container. If your host knows a user named unetuser and if you have a caffe-unet installation on the host but it does not work due to missing or insufficient CUDA toolkit, this would explain the error. However, these are many ifs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA driver verision is insufficient for CUDA runtime version #1

CUDA driver verision is insufficient for CUDA runtime version #1

sghilardi commented Mar 10, 2020 •

edited

Loading

ThorstenFalk commented Mar 11, 2020

sghilardi commented Mar 11, 2020 •

edited

Loading

ThorstenFalk commented Mar 12, 2020

sghilardi commented Mar 12, 2020 •

edited

Loading

sheldonxxd commented Jul 7, 2020 •

edited

Loading

ThorstenFalk commented Jul 7, 2020

CUDA driver verision is insufficient for CUDA runtime version #1

CUDA driver verision is insufficient for CUDA runtime version #1

Comments

sghilardi commented Mar 10, 2020 • edited Loading

ThorstenFalk commented Mar 11, 2020

sghilardi commented Mar 11, 2020 • edited Loading

ThorstenFalk commented Mar 12, 2020

sghilardi commented Mar 12, 2020 • edited Loading

sheldonxxd commented Jul 7, 2020 • edited Loading

ThorstenFalk commented Jul 7, 2020

sghilardi commented Mar 10, 2020 •

edited

Loading

sghilardi commented Mar 11, 2020 •

edited

Loading

sghilardi commented Mar 12, 2020 •

edited

Loading

sheldonxxd commented Jul 7, 2020 •

edited

Loading