Skip to content
This repository has been archived by the owner on Mar 17, 2021. It is now read-only.

GPU->CPU Memcpy failed #436

Closed
talmazov opened this issue Aug 24, 2019 · 6 comments
Closed

GPU->CPU Memcpy failed #436

talmazov opened this issue Aug 24, 2019 · 6 comments
Assignees
Labels

Comments

@talmazov
Copy link

talmazov commented Aug 24, 2019

Hey everyone,
I am trying to run a training on 2 CBCT segmented volumes but I am running into an issue of GPU->CPU Memcpy failed. I was reading around in the tensorflow repo that it is an issue of insufficient memory so i reduced the size of the volumes drastically, as well as the batch_size and queue_length however I still get this error. I have included the configuration file here. My PC runs on 32 GB of RAM and GeForce RTX2060 w/ 6GB VRAM. NiftyNet and TensorFlow recognize the device. I have CUDA 10 installed on the nvidia driver 418 w/ cuDNN 7.6.3
When i run in the python console
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
the GPU appears and everything is fine

Most notably I see "could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR" then when niftynet performs a Parameters from random initialisations where the shuffle buffer is filled, it gives the GPU->CPU memcpy failed error
2019-08-24 17:21:52.622248: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:162] Shuffle buffer filled.

2019-08-24 17:22:00.194936: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-08-24 17:22:00.370394: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-08-24 17:22:00.383907: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-08-24 17:22:00.394481: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-08-24 17:22:00.394513: W ./tensorflow/stream_executor/stream.h:1995] attempting to perform DNN operation using StreamExecutor without DNN support
2019-08-24 17:22:00.394571: I tensorflow/stream_executor/stream.cc:1865] [stream=0x461c2b0,impl=0x4908880] did not wait for [stream=0x474a8d0,impl=0x48faf40]
2019-08-24 17:22:00.394578: I tensorflow/stream_executor/stream.cc:4800] [stream=0x461c2b0,impl=0x4908880] did not memcpy device-to-host; source: 0x7effad60b300
2019-08-24 17:22:00.394652: F tensorflow/core/common_runtime/gpu/gpu_util.cc:293] GPU->CPU Memcpy failed

I do not get this issue when performing training on the CPU.

I switched the networks from dense_vnet to highres3dnet and at this point I only get the "Could not create cudnn handle" error. I modified the util_common.py tf_config() method to include config.gpu_options.allow_growth = True
as described in tensorflow/tensorflow#24496
but that does not seem to address the issue.

Any thoughts? Is this an issue of not enough VRAM?

the command i use to run is
python3 net_segment.py train -c ~/mandible_segmentation/config.ini

My configuration is

############################ input configuration sections
[cbct]
path_to_search = /home/mayotic/mandible_segmentation/CBCT_TRAINING/
filename_contains = _cbct
interp_order = 1
spatial_window_size = (120,120,200)
axcodes=(A, R, S)

[label]
path_to_search = /home/mayotic/mandible_segmentation/CBCT_TRAINING/
filename_contains = _label
interp_order = 0
spatial_window_size = (120,120,200)
axcodes=(A, R, S)

############################## system configuration sections
[SYSTEM]
cuda_devices = 0
num_threads = 1
num_gpus = 1
model_dir = /home/mayotic/mandible_segmentation/
queue_length = 36

[NETWORK]
name = dense_vnet
batch_size = 6

# volume level preprocessing
volume_padding_size = 0
window_sampling = uniform

[TRAINING]
sample_per_volume = 1
lr = 0.001
starting_iter = 0
save_every_n = 1000
max_iter = 3001
tensorboard_every_n = 1

[INFERENCE]
border = (0, 0, 0)
inference_iter = 3000
output_interp_order = 0
spatial_window_size = (120,120,200)
save_seg_dir = /home/mayotic/mandible_segmentation/segmentation_output/

############################ custom configuration sections
[SEGMENTATION]
image = cbct
label = label
label_normalisation = False
output_prob = False
num_classes = 2

running the following code from within python3 cli works just fine

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))
@IsaacLord
Copy link

Did you find any solution for this issue?

@talmazov
Copy link
Author

talmazov commented Oct 1, 2019

Did you find any solution for this issue?

no solution.
I suspect this is an issue with the underlying code not being able to copy data between system RAM and VRAM but im not sure why - i see the bug was assigned to someone. for now either drastically decrease the DICOM resolution or increase you graphics card's available VRAM.
Also to be fair, processing CT/CBCT DICOM even on CPU (which does not produce this error), my PC maxes out 32gigs of RAM easily

@IsaacLord
Copy link

Just Change the Tensorflow version!
It dose work !

@IsaacLord
Copy link

you can follow the following link to solve this issue!
#447

@wyli wyli closed this as completed in 4bc6126 Oct 6, 2019
@talmazov
Copy link
Author

talmazov commented Dec 7, 2019

Hey,
so i tried again, when i run tensorflow GPU for object detection everything runs fine. I have installed numpy 1.16.0 amd tensorflow-gpu 1.13.2 however i still get the GPU-CPU memcpy failed

i am not sure why CUDNN handle could not be created

2019-12-06 22:01:17.891564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-12-06 22:01:17.891610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-06 22:01:17.891614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-12-06 22:01:17.891617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-12-06 22:01:17.891694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4853 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:niftynet: Parameters from random initialisations ...
2019-12-06 22:01:24.491935: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-12-06 22:01:24.939564: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x4ae44b0
2019-12-06 22:01:35.242771: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:101] Filling up shuffle buffer (this may take a while): 12 of 30
2019-12-06 22:01:45.241902: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:101] Filling up shuffle buffer (this may take a while): 24 of 30
2019-12-06 22:01:50.185805: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:140] Shuffle buffer filled.
2019-12-06 22:01:55.576205: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-06 22:01:55.588900: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-06 22:01:55.600046: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-06 22:01:55.600087: W ./tensorflow/stream_executor/stream.h:2099] attempting to perform DNN operation using StreamExecutor without DNN support
INFO:niftynet: cleaning up...
INFO:niftynet: stopping sampling threads
2019-12-06 22:01:56.241818: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4aeaa00,impl=0x4aeaaa0] did not wait for [stream=0x4aea520,impl=0x4ae44d0]
2019-12-06 22:01:56.241845: I tensorflow/stream_executor/stream.cc:5014] [stream=0x4aeaa00,impl=0x4aeaaa0] did not memcpy device-to-host; source: 0x7fc710cc5b00
2019-12-06 22:01:56.241879: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4aeaa00,impl=0x4aeaaa0] did not wait for [stream=0x4aea520,impl=0x4ae44d0]
2019-12-06 22:01:56.241921: I tensorflow/stream_executor/stream.cc:5014] [stream=0x4aeaa00,impl=0x4aeaaa0] did not memcpy device-to-host; source: 0x7fc710cc5c00
2019-12-06 22:01:56.241933: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4aeaa00,impl=0x4aeaaa0] did not wait for [stream=0x4aea520,impl=0x4ae44d0]
2019-12-06 22:01:56.241938: I tensorflow/stream_executor/stream.cc:5014] [stream=0x4aeaa00,impl=0x4aeaaa0] did not memcpy device-to-host; source: 0x7fc710cc5a00
2019-12-06 22:01:56.241949: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4aeaa00,impl=0x4aeaaa0] did not wait for [stream=0x4aea520,impl=0x4ae44d0]
2019-12-06 22:01:56.241948: F tensorflow/core/common_runtime/gpu/gpu_util.cc:292] GPU->CPU Memcpy failed
2019-12-06 22:01:56.241964: I tensorflow/stream_executor/stream.cc:5014] [stream=0x4aeaa00,impl=0x4aeaaa0] did not memcpy device-to-host; source: 0x7fc710cc5e00
Aborted

@talmazov
Copy link
Author

talmazov commented Dec 7, 2019

I tried

sudo python3 net_download.py dense_vnet_abdominal_ct_model_zoo
sudo python3 net_segment.py inference -c ~/niftynet/extensions/dense_vnet_abdominal_ct/config.ini

and i get

2019-12-07 11:37:02.795645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4904 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:niftynet: Restoring parameters from /home/mayotic/niftynet/models/dense_vnet_abdominal_ct/models/model.ckpt-3000
2019-12-07 11:37:06.140918: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-07 11:37:06.143425: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-07 11:37:06.148064: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-07 11:37:06.148081: W ./tensorflow/stream_executor/stream.h:2099] attempting to perform DNN operation using StreamExecutor without DNN support
INFO:niftynet: cleaning up...
INFO:niftynet: stopping sampling threads

what version of cuDNN is everybody else running??
i have niftynet 0.6, CUDA 10.0, tensorflow-gpu 1.13.2 and numpy 1.16
using geforce RTX 2060 6GB vram with nvidia driver 440.33.01
tensorflow tries to allocate 5 GB
spatial_window_size = (64, 64, 512) with dense_vnet network

is this a common error thrown when the GPU does not have enough physical memory to run training?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants