Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA driver verision is insufficient for CUDA runtime version #1

Open
sghilardi opened this issue Mar 10, 2020 · 6 comments
Open

CUDA driver verision is insufficient for CUDA runtime version #1

sghilardi opened this issue Mar 10, 2020 · 6 comments

Comments

@sghilardi
Copy link

sghilardi commented Mar 10, 2020

Hello,
I am trying to set up the U-Net server backend using the simple docker method (startServer.sh). I can successfully create the docker container and connect to it using the U-Net Fiji plug-in, but during segmentation I get the following error:

Model check failed:
F0310 19:40:56.398861 55 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version

Does anyone have any insight into how to fix the driver/runtime error in Docker?
Here is the full Fiji log for the event: (note that after the error message, I closed the docker window which is why there are 'Host Unreachable' errors at the end.)


Establishing SSH connection for '[email protected]:22'
[email protected]$ mkdir ~/data1; mkdir ~/cellnet;
[email protected] $ mkdir "/home/unetuser/~"
[email protected] $ mkdir "/home/unetuser/~/cellnet"
$ sftp "/tmp/unet-12ffe596-e444-4995-b0a3-7a59477999d87163029522390179009.tmp" "[email protected]:22:/home/unetuser/~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d87163029522390179009.tmp"
[email protected] $ rm "/home/unetuser/~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d87163029522390179009.tmp"
[email protected]$ caffe_unet
Setting caffe_unet binary path to caffe_unet
Searching for caffe
[email protected]$ caffe
$ sftp "/home/sgrolab/Downloads/caffemodels/3d_cell_net_microspores-fluorescence_v0.modeldef.h5" "[email protected]:22:/home/unetuser/~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5"
[email protected]$ caffe_unet check_model_and_weights_h5 -model "~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5" -weights "~/data1" -n_channels 1 -gpu 0
F0310 19:40:44.387264    53 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
$ sftp "/home/sgrolab/Downloads/caffemodels/2d_cell_net_v0.caffemodel.h5" "[email protected]:22:/home/unetuser/~/data1"
[email protected]$ caffe_unet check_model_and_weights_h5 -model "~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5" -weights "~/data1" -n_channels 1 -gpu 0
Model check failed:
F0310 19:40:56.398861    55 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
F0310 19:40:56.398861    55 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
Establishing SSH connection for '[email protected]:22'
Could not remove temporary file ~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5: com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host (Host unreachable)
Establishing SSH connection for '[email protected]:22'
Could not remove temporary folder /home/unetuser/~/cellnet: com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host (Host unreachable)
Establishing SSH connection for '[email protected]:22'
Could not remove temporary folder /home/unetuser/~: com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host (Host unreachable)
Establishing SSH connection for '[email protected]:22'
Establishing SSH connection for '[email protected]:22'
U-Net job aborted

@ThorstenFalk
Copy link
Collaborator

Which nVidia driver are you running on your docker host? Probably you simply have to update the graphics driver to make it work.

@sghilardi
Copy link
Author

sghilardi commented Mar 11, 2020

Hi Thorsten, I am running driver version 440.33.01, which was installed as part of the latest cuda installation available for Ubuntu 18.04.
Wed Mar 11 18:08:54 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro P400 Off | 00000000:65:00.0 On | N/A | | 34% 44C P0 N/A / N/A | 661MiB / 1984MiB | 1% Default | +-------------------------------+----------------------+----------------------+ | 1 Quadro RTX 4000 Off | 00000000:B3:00.0 Off | N/A | | 30% 29C P8 1W / 125W | 1MiB / 7982MiB | 0% Default | +-------------------------------+----------------------+----------------------+
I also modified the --gpus flag on the docker container so that it only accesses the RTX 4000, and the container does only see the GPU
`* Starting OpenBSD Secure Shell server sshd [ OK ]
unetuser@lmbunet:~$ nvidia-smi
Wed Mar 11 22:13:44 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 4000 Off | 00000000:B3:00.0 Off | N/A |
| 30% 29C P8 1W / 125W | 1MiB / 7982MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+`

@ThorstenFalk
Copy link
Collaborator

Are you running nvidia-docker or plain docker? With plain docker it won't work because the docker container must share the nvidia driver with the host system.

@sghilardi
Copy link
Author

sghilardi commented Mar 12, 2020

I am running Docker 19.03.7 with Nvidia-docker. This version did change the syntax slightly from --runtime=nvidia to --gpu <option>, so I did make the corresponding change to the startServer.sh file, which now reads: docker run -it --rm --gpus "device=1" --hostname "lmbunet" -p 2222:22 lmb-unet-server nvidia/cuda:10.0-base nvidia-smi .
I believe the containers are running nvidia-docker because when I run this script and then run the nvidia-smi command in the docker container , the container has access to the GPU : unetuser@lmbunet:~$ nvidia-smi
Thu Mar 12 21:04:57 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 4000 Off | 00000000:B3:00.0 Off | N/A |
| 30% 29C P8 1W / 125W | 1MiB / 7982MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|===========================================================================``==|
| No running processes found |
+-----------------------------------------------------------------------------+

If i remove the --gpu flag, the container still loads, but the nvidia-smi command doesn't return a gpu

unetuser@lmbunet:~$ nvidia-smi
bash: nvidia-smi: command not found

@sheldonxxd
Copy link

sheldonxxd commented Jul 7, 2020

Hello,
I am trying to set up the U-Net server backend using the simple docker method (startServer.sh). I can successfully create the docker container and connect to it using the U-Net Fiji plug-in, but during segmentation I get the following error:

Model check failed:
F0310 19:40:56.398861 55 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version

Does anyone have any insight into how to fix the driver/runtime error in Docker?
Here is the full Fiji log for the event: (note that after the error message, I closed the docker window which is why there are 'Host Unreachable' errors at the end.)


Establishing SSH connection for '[email protected]:22'
[email protected]$ mkdir ~/data1; mkdir ~/cellnet;
[email protected] $ mkdir "/home/unetuser/~"
[email protected] $ mkdir "/home/unetuser/~/cellnet"
$ sftp "/tmp/unet-12ffe596-e444-4995-b0a3-7a59477999d87163029522390179009.tmp" "[email protected]:22:/home/unetuser/~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d87163029522390179009.tmp"
[email protected] $ rm "/home/unetuser/~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d87163029522390179009.tmp"
[email protected]$ caffe_unet
Setting caffe_unet binary path to caffe_unet
Searching for caffe
[email protected]$ caffe
$ sftp "/home/sgrolab/Downloads/caffemodels/3d_cell_net_microspores-fluorescence_v0.modeldef.h5" "[email protected]:22:/home/unetuser/~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5"
[email protected]$ caffe_unet check_model_and_weights_h5 -model "~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5" -weights "~/data1" -n_channels 1 -gpu 0
F0310 19:40:44.387264    53 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
$ sftp "/home/sgrolab/Downloads/caffemodels/2d_cell_net_v0.caffemodel.h5" "[email protected]:22:/home/unetuser/~/data1"
[email protected]$ caffe_unet check_model_and_weights_h5 -model "~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5" -weights "~/data1" -n_channels 1 -gpu 0
Model check failed:
F0310 19:40:56.398861    55 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
F0310 19:40:56.398861    55 common.cpp:152] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
Establishing SSH connection for '[email protected]:22'
Could not remove temporary file ~/cellnet/unet-12ffe596-e444-4995-b0a3-7a59477999d8.modeldef.h5: com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host (Host unreachable)
Establishing SSH connection for '[email protected]:22'
Could not remove temporary folder /home/unetuser/~/cellnet: com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host (Host unreachable)
Establishing SSH connection for '[email protected]:22'
Could not remove temporary folder /home/unetuser/~: com.jcraft.jsch.JSchException: java.net.NoRouteToHostException: No route to host (Host unreachable)
Establishing SSH connection for '[email protected]:22'
Establishing SSH connection for '[email protected]:22'
U-Net job aborted

This is an excellent tool for cell segmentation.

I installed and run the docker perfectly in last week (Ubuntu20.04+docker19.03.12+nvidia-docker, nvidia-driver:440.100, cuda:10.2, RTX2060Super). After the installation of nvidia-docker, I check it as below:

image-20200702125916642

Then I downloaded the caffe-unet docker zip file to make. --no-check-certificate should be added into the line 23 in Dockerfile-bin file to avoid the error of wget function.

No other specific change.

I notice your port is "22", not "2222", which is different from the tutorial. But if it is the question, you can't establish connection (seems like you modify the startServer.sh).

Anyway, wish you solve the problem.

@ThorstenFalk
Copy link
Collaborator

I agree that in the logs you should see port 2222 if you run the docker container with
docker run --rm --gpus "device=1" --hostname "lmbunet" -p 2222:22 -it lmb-unet-server.

If you did not explicitly change the default ssh port of your host system, port 22 will connect to the host not to the docker container. If your host knows a user named unetuser and if you have a caffe-unet installation on the host but it does not work due to missing or insufficient CUDA toolkit, this would explain the error. However, these are many ifs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants