GPU->CPU Memcpy failed #436

talmazov · 2019-08-24T19:13:11Z

Hey everyone,
I am trying to run a training on 2 CBCT segmented volumes but I am running into an issue of GPU->CPU Memcpy failed. I was reading around in the tensorflow repo that it is an issue of insufficient memory so i reduced the size of the volumes drastically, as well as the batch_size and queue_length however I still get this error. I have included the configuration file here. My PC runs on 32 GB of RAM and GeForce RTX2060 w/ 6GB VRAM. NiftyNet and TensorFlow recognize the device. I have CUDA 10 installed on the nvidia driver 418 w/ cuDNN 7.6.3
When i run in the python console
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
the GPU appears and everything is fine

Most notably I see "could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR" then when niftynet performs a Parameters from random initialisations where the shuffle buffer is filled, it gives the GPU->CPU memcpy failed error
2019-08-24 17:21:52.622248: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:162] Shuffle buffer filled.

2019-08-24 17:22:00.194936: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-08-24 17:22:00.370394: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-08-24 17:22:00.383907: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-08-24 17:22:00.394481: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-08-24 17:22:00.394513: W ./tensorflow/stream_executor/stream.h:1995] attempting to perform DNN operation using StreamExecutor without DNN support
2019-08-24 17:22:00.394571: I tensorflow/stream_executor/stream.cc:1865] [stream=0x461c2b0,impl=0x4908880] did not wait for [stream=0x474a8d0,impl=0x48faf40]
2019-08-24 17:22:00.394578: I tensorflow/stream_executor/stream.cc:4800] [stream=0x461c2b0,impl=0x4908880] did not memcpy device-to-host; source: 0x7effad60b300
2019-08-24 17:22:00.394652: F tensorflow/core/common_runtime/gpu/gpu_util.cc:293] GPU->CPU Memcpy failed

I do not get this issue when performing training on the CPU.

I switched the networks from dense_vnet to highres3dnet and at this point I only get the "Could not create cudnn handle" error. I modified the util_common.py tf_config() method to include config.gpu_options.allow_growth = True
as described in tensorflow/tensorflow#24496
but that does not seem to address the issue.

Any thoughts? Is this an issue of not enough VRAM?

the command i use to run is
python3 net_segment.py train -c ~/mandible_segmentation/config.ini

My configuration is

############################ input configuration sections
[cbct]
path_to_search = /home/mayotic/mandible_segmentation/CBCT_TRAINING/
filename_contains = _cbct
interp_order = 1
spatial_window_size = (120,120,200)
axcodes=(A, R, S)

[label]
path_to_search = /home/mayotic/mandible_segmentation/CBCT_TRAINING/
filename_contains = _label
interp_order = 0
spatial_window_size = (120,120,200)
axcodes=(A, R, S)

############################## system configuration sections
[SYSTEM]
cuda_devices = 0
num_threads = 1
num_gpus = 1
model_dir = /home/mayotic/mandible_segmentation/
queue_length = 36

[NETWORK]
name = dense_vnet
batch_size = 6

# volume level preprocessing
volume_padding_size = 0
window_sampling = uniform

[TRAINING]
sample_per_volume = 1
lr = 0.001
starting_iter = 0
save_every_n = 1000
max_iter = 3001
tensorboard_every_n = 1

[INFERENCE]
border = (0, 0, 0)
inference_iter = 3000
output_interp_order = 0
spatial_window_size = (120,120,200)
save_seg_dir = /home/mayotic/mandible_segmentation/segmentation_output/

############################ custom configuration sections
[SEGMENTATION]
image = cbct
label = label
label_normalisation = False
output_prob = False
num_classes = 2

running the following code from within python3 cli works just fine

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

The text was updated successfully, but these errors were encountered:

IsaacLord · 2019-10-01T12:29:04Z

Did you find any solution for this issue?

talmazov · 2019-10-01T13:09:40Z

Did you find any solution for this issue?

no solution.
I suspect this is an issue with the underlying code not being able to copy data between system RAM and VRAM but im not sure why - i see the bug was assigned to someone. for now either drastically decrease the DICOM resolution or increase you graphics card's available VRAM.
Also to be fair, processing CT/CBCT DICOM even on CPU (which does not produce this error), my PC maxes out 32gigs of RAM easily

IsaacLord · 2019-10-02T11:07:49Z

Just Change the Tensorflow version!
It dose work !

IsaacLord · 2019-10-02T11:08:51Z

you can follow the following link to solve this issue!
#447

talmazov · 2019-12-07T04:08:09Z

Hey,
so i tried again, when i run tensorflow GPU for object detection everything runs fine. I have installed numpy 1.16.0 amd tensorflow-gpu 1.13.2 however i still get the GPU-CPU memcpy failed

i am not sure why CUDNN handle could not be created

2019-12-06 22:01:17.891564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-12-06 22:01:17.891610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-06 22:01:17.891614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-12-06 22:01:17.891617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-12-06 22:01:17.891694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4853 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:niftynet: Parameters from random initialisations ...
2019-12-06 22:01:24.491935: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-12-06 22:01:24.939564: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x4ae44b0
2019-12-06 22:01:35.242771: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:101] Filling up shuffle buffer (this may take a while): 12 of 30
2019-12-06 22:01:45.241902: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:101] Filling up shuffle buffer (this may take a while): 24 of 30
2019-12-06 22:01:50.185805: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:140] Shuffle buffer filled.
2019-12-06 22:01:55.576205: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-06 22:01:55.588900: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-06 22:01:55.600046: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-06 22:01:55.600087: W ./tensorflow/stream_executor/stream.h:2099] attempting to perform DNN operation using StreamExecutor without DNN support
INFO:niftynet: cleaning up...
INFO:niftynet: stopping sampling threads
2019-12-06 22:01:56.241818: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4aeaa00,impl=0x4aeaaa0] did not wait for [stream=0x4aea520,impl=0x4ae44d0]
2019-12-06 22:01:56.241845: I tensorflow/stream_executor/stream.cc:5014] [stream=0x4aeaa00,impl=0x4aeaaa0] did not memcpy device-to-host; source: 0x7fc710cc5b00
2019-12-06 22:01:56.241879: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4aeaa00,impl=0x4aeaaa0] did not wait for [stream=0x4aea520,impl=0x4ae44d0]
2019-12-06 22:01:56.241921: I tensorflow/stream_executor/stream.cc:5014] [stream=0x4aeaa00,impl=0x4aeaaa0] did not memcpy device-to-host; source: 0x7fc710cc5c00
2019-12-06 22:01:56.241933: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4aeaa00,impl=0x4aeaaa0] did not wait for [stream=0x4aea520,impl=0x4ae44d0]
2019-12-06 22:01:56.241938: I tensorflow/stream_executor/stream.cc:5014] [stream=0x4aeaa00,impl=0x4aeaaa0] did not memcpy device-to-host; source: 0x7fc710cc5a00
2019-12-06 22:01:56.241949: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4aeaa00,impl=0x4aeaaa0] did not wait for [stream=0x4aea520,impl=0x4ae44d0]
2019-12-06 22:01:56.241948: F tensorflow/core/common_runtime/gpu/gpu_util.cc:292] GPU->CPU Memcpy failed
2019-12-06 22:01:56.241964: I tensorflow/stream_executor/stream.cc:5014] [stream=0x4aeaa00,impl=0x4aeaaa0] did not memcpy device-to-host; source: 0x7fc710cc5e00
Aborted

talmazov · 2019-12-07T17:43:59Z

I tried

sudo python3 net_download.py dense_vnet_abdominal_ct_model_zoo
sudo python3 net_segment.py inference -c ~/niftynet/extensions/dense_vnet_abdominal_ct/config.ini

and i get

2019-12-07 11:37:02.795645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4904 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:niftynet: Restoring parameters from /home/mayotic/niftynet/models/dense_vnet_abdominal_ct/models/model.ckpt-3000
2019-12-07 11:37:06.140918: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-07 11:37:06.143425: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-07 11:37:06.148064: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-07 11:37:06.148081: W ./tensorflow/stream_executor/stream.h:2099] attempting to perform DNN operation using StreamExecutor without DNN support
INFO:niftynet: cleaning up...
INFO:niftynet: stopping sampling threads

what version of cuDNN is everybody else running??
i have niftynet 0.6, CUDA 10.0, tensorflow-gpu 1.13.2 and numpy 1.16
using geforce RTX 2060 6GB vram with nvidia driver 440.33.01
tensorflow tries to allocate 5 GB
spatial_window_size = (64, 64, 512) with dense_vnet network

is this a common error thrown when the GPU does not have enough physical memory to run training?

ericspod assigned tomvars Aug 27, 2019

ericspod added the bug label Sep 5, 2019

wyli closed this as completed in 4bc6126 Oct 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU->CPU Memcpy failed #436

GPU->CPU Memcpy failed #436

talmazov commented Aug 24, 2019 •

edited

Loading

IsaacLord commented Oct 1, 2019

talmazov commented Oct 1, 2019 •

edited

Loading

IsaacLord commented Oct 2, 2019

IsaacLord commented Oct 2, 2019

talmazov commented Dec 7, 2019 •

edited

Loading

talmazov commented Dec 7, 2019 •

edited

Loading

GPU->CPU Memcpy failed #436

GPU->CPU Memcpy failed #436

Comments

talmazov commented Aug 24, 2019 • edited Loading

IsaacLord commented Oct 1, 2019

talmazov commented Oct 1, 2019 • edited Loading

IsaacLord commented Oct 2, 2019

IsaacLord commented Oct 2, 2019

talmazov commented Dec 7, 2019 • edited Loading

talmazov commented Dec 7, 2019 • edited Loading

talmazov commented Aug 24, 2019 •

edited

Loading

talmazov commented Oct 1, 2019 •

edited

Loading

talmazov commented Dec 7, 2019 •

edited

Loading

talmazov commented Dec 7, 2019 •

edited

Loading