Nd conv pool #2824

WGW101 · 2015-07-27T12:48:50Z

Hi !

Following my issue ticket #2671, here is my pull request for nD convolution and pooling using CuDNN primitives.

The nD convolution by itself seems to work, but the biases addition using cudnnAddTensor() returns NOT_SUPPORTED status.

The nD pooling doesn't work, and returns NOT_SUPPORTED.
Apparently the nD pooling descriptor might only be a place-holder in this version, so this might work with future version of cuDNN...

I inherits my layers directly from the Layer class and not the BaseConvolutionLayer and BasePoolingLayer to avoid modifying any existing (and working..) features.
The major drawback of this approach is that it won't be able to rely on other engines if CuDNN is not supported by the user configuration. But as I declared a LayerFactory for NdConvolution and NdPooling, it might be relatively easy to solve this behaviour.

Don't hesitate to give me feedbacks on this two new layers,
and share any new insight about why it doesn't work.

I'm already aware of the #2049 PR for nD convolution, but I'm still missing nD pooling (actually I only need 3D pooling in my application).

Cheers,

This reverts commit c5c8e17a9a604941faad46710e28ab4c8aa5602d.

…ively

Yeongtae · 2015-08-03T02:23:13Z

I checked this branch. but I couldn't run, because of error "num_axess() <= 4" in blob.hpp and base_conv_layer.cpp.

WGW101 · 2015-08-03T07:07:30Z

The two new layers I added don't use the base_conv or base_pool class.
To use them you might want to change the layer type from "Convolution" to "NdConvolution" (see layer_factory.cpp), then describe your kernel, stride and pad shape using BlobShape messages (see caffe.proto).

Can I see your prototxt file ?

Yeongtae · 2015-08-03T07:56:06Z

@WGW101 Thank you for response. I modified Convolution to NdConvolution. But it show an error "not implemented yet". I didn't use CUDNN. It made this error. Is it right?

WGW101 · 2015-08-03T08:05:49Z

Yes, unfortunately it only works with cudnn for now.
(actually everything doesn't even work with it, but I'm waiting for v3 which should come out very soon).

For an implementation of Nd convolution with the caffe engine, see PR #2049 by Jeff Donahue.

Yeongtae · 2015-08-03T08:13:42Z

@WGW101 I'm using #2049 and #2442. I think second one is better.

In addition, I'm touching 3D convolution for action classification from video to extract spatial and temporal feature. I'm very confusing to handle networks which are blobs, e.g weights of conv, pool and ip. Because ND data can't use matcaffe. Do you have any idea for this?

WGW101 · 2015-08-03T13:11:03Z

@Yeongtae The python interface is quite easy to understand and very similar to what Matlab could look like.

Let's say you load your network like this:
net = caffe.Net(path/to/your/prototxt, caffe.TEST)

Then the weights are available like this:
net.params.["LayerName"][0].data

and the biases like this:
net.params.["LayerName"][1].data

That would work for Conv and IP, not for pooling as it has no parameters..
There are a few python notebook tutorials in the base caffe repo, take a look at it for more info.

Yeongtae · 2015-08-05T06:14:56Z

@WGW101 Follow your advices, I have solved my problem.

I'm verifying that 3D convolution is right, using the convn function in matlab.
I added the bias of conv1 to the result of the convn function.
To use it, I extracted a data, a result of conv1, a weight of conv1 using pycaffe.

After testing, I check an weird result.

As an input is ones(n,n,n), the difference between a caffe result and a matlab result shows that all element are same to 1.0e-0.6* -0.4992.
As an input is rand(n.n.n), the difference between a caffe result and a matlab result shows that all element are different values

Therefore, It means that 3D convolution of this branch and matlab are different.

Do you have any idea for this?
And
Do you think that nD conv and nD poolling are well implemented?

Yeongtae · 2015-08-05T08:06:59Z

Using imfilter with the region without padding, it shows very small error 1.0e-0.6*n
I think I have done verifying this branch.

dhkim19e · 2015-09-12T12:33:32Z

Hi!

I noticed that CuDNN was updated in this week (In v3 RC, cudnnAddTensor was not supported).

So I checked with the new release, and this PR works fine just changing the function cudnnAddTensor to cudnnAddTensor_v3 (in the new API, second parameter 'mode' was removed).

Thanks!

WGW101 · 2015-09-14T12:35:40Z

@squall815

Hi !

Thanks for your feedback !

I'm sorry I wasn't able to test this PR myself with the new version of CuDNN as my hardware isn't supported by CUDA 7.0 (required for cudnn v3 ..)

I hope I'll be able to resume the development of this branch some day (cleaning up everything to pass all tests, adding CPU / Caffe Engine with #2049 and #2442 integrated with the BlobShape message and the separated layer to keep the best performances in 2D etc..)

rockstone533 · 2015-10-06T09:01:40Z

@Yeongtae I trained volume data, and the input is in hdf5 format.When I use matcaffe to parse caffemodel, I got an error below.Do you know how to solve it?
Check failed: num_kernel_dims == 1 || num_kernel_dims == num_spatial_axes_ kernel_size must be specified once, or once per spatial dimension (kernel_size specified 3 times; 2 spatial dims);

Yeongtae · 2015-10-06T09:10:13Z

I just use pycaffe.

rockstone533 · 2015-10-06T09:15:14Z

@Yeongtae What's about your input data format?Do you use hdf5?

Yeongtae · 2015-10-06T09:17:45Z

Yes. I used it.

Yeongtae · 2015-10-06T09:20:00Z

Do you need some example?

rockstone533 · 2015-10-06T09:22:24Z

@Yeongtae Yeah, it couldn't be better!

rockstone533 · 2015-10-06T16:16:31Z

Hey, @WGW101, I wanna know whether your current version support Nd convolution with Cudnn?

WGW101 · 2015-10-07T10:04:07Z

@rockstone533 Hi ! Yes it should if you don't have biases.
If you do, @squall815 suggested a minor modification to make it work:
change cudnnAddTensor to cudnnAddTensor_v3

Sorry I can't test it myself for material incompatibility reasons...

rockstone533 · 2015-10-08T02:34:02Z

@WGW101 Yeah, I've changed it and my model began to work. However, the speed seems a bit slow. How about your running speed? @squall815

ToruHironaka · 2015-12-08T21:09:33Z

@WGW101, I used this promotion but I was no sure about train-val.prototxt layer setting. Here is what I did so far. I use libcudnn.so.7.0.

I changed the layer names: Convolution --> NdConvolution and Pooling --> NdPooling
Change the engine from CAFFE to CUDNN
Added kernel_shape like below

pooling_param {
kernel_shape { dim: 2 dim: 1 dim: 20 dim: 20 dim: 20 }
pool: MAX
kernel_size: 3
stride: 2
}

I have a question here. I think I have to increase a number of kernel_size because I have 3D data 20x20x20 so I have to set 2 more kernel_size but I always got below message

" Error parsing text-format caffe.NetParameter: 47:16: Non-repeated field "kernel_size" is specified multiple times"

Promotion #2049 required to increase the number of kernel_size in order to train 3D data but this promotion required me to set kernel_shape instead of increasing a number of kernel_size. So, I think I did not get 3D layer.

I could train my layer till Iteration 0, Testing net (#0) but I got the error below. I think my kernel_shape setting was not correct.

F1208 15:44:10.254520 28197 cudnn_ndconv_layer.cu:43] Check failed: status == CUDNN_STATUS_SUCCESS (3 vs. 0) CUDNN_STATUS_BAD_PARAM

WGW101 · 2015-12-09T08:40:42Z

@ToruHironaka Hi !

First you shouldn't use kernel_size, nor stride to implement the NdPooling layer added by this PR, they are replaced by kernel_shape and stride_shape instead.
I clearly need to raise an error if both are specified.

In the current implementation the kernel_size, stride and pad of the master branch are simply ignored, and kernel_shape is required, stride_shape default to all 1 and pad_shape default to all 0. It is likely to change to be more adaptive in future versions.

Be careful not to confuse the shape of your kernel and the shape of your input.
From what I understand, here is what your pooling layer should look like in your .prototxt:

layer {
  name: "XXX"
  type: "NdPooling"
  bottom: "yyy" // This is your 2x1x20x20x20 data blob
  top: "xxx" // You'll get a 2x1x9x9x9 output blob

  pooling_param {
    pool: MAX
    kernel_shape { // This is your 3x3x3 kernel
      dim: 3
      dim: 3
      dim: 3
    }
    stride_shape { // And 2x2x2 stride.
      dim: 2
      dim: 2
      dim: 2
    }
  }
}

If any error persist feel free to ask for help again.

Regards

ToruHironaka · 2015-12-10T01:14:09Z

@WGW101, thanks for your reply, I really appreciated it. I tried below layer model but I got the same problem.

<omitted data layer, I use hdf5 datasets >
layer {
name: "conv1"
type: "NdConvolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_shape {
dim: 11
dim: 11
dim: 11
}
stride_shape {
dim: 4
dim: 4
dim: 4
}
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "pool1"
type: "NdPooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_shape {
dim: 3
dim: 3
dim: 3
}
stride_shape {
dim: 2
dim: 2
dim: 2
}
engine: CUDNN
}
}

layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool1"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}

layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip1"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip1"
bottom: "label"
top: "loss"
}

Error:

I1209 19:53:55.656716 7984 net.cpp:155] Setting up data
I1209 19:53:55.656774 7984 net.cpp:163] Top shape: 2 1 20 20 20 (16000)
I1209 19:53:55.656800 7984 net.cpp:163] Top shape: 2 (2)
I1209 19:53:55.656813 7984 net.cpp:174] Memory required for data: 64008
I1209 19:53:55.656849 7984 layer_factory.hpp:76] Creating layer conv1
I1209 19:53:55.656950 7984 net.cpp:110] Creating Layer conv1
I1209 19:53:55.656983 7984 net.cpp:477] conv1 <- data
I1209 19:53:55.657047 7984 net.cpp:433] conv1 -> conv1
I1209 19:53:55.876652 7984 net.cpp:155] Setting up conv1
I1209 19:53:55.876729 7984 net.cpp:163] Top shape: 2 96 3 3 3 (5184)
I1209 19:53:55.876742 7984 net.cpp:174] Memory required for data: 84744
I1209 19:53:55.876843 7984 layer_factory.hpp:76] Creating layer pool1
I1209 19:53:55.876912 7984 net.cpp:110] Creating Layer pool1
I1209 19:53:55.876935 7984 net.cpp:477] pool1 <- conv1
I1209 19:53:55.876974 7984 net.cpp:433] pool1 -> pool1
F1209 19:53:55.877362 7984 cudnn.hpp:87] Check failed: status == CUDNN_STATUS_SUCCESS (3 vs. 0) CUDNN_STATUS_BAD_PARAM
*** Check failure stack trace: ***
@ 0x7f7c5ae20daa (unknown)
@ 0x7f7c5ae20ce4 (unknown)
@ 0x7f7c5ae206e6 (unknown)
@ 0x7f7c5ae23687 (unknown)
@ 0x7f7c5b632fbc caffe::cudnn::setTensorNdDesc<>()
@ 0x7f7c5b63250a caffe::cudnn::setTensorNdDesc<>()
@ 0x7f7c5b63025b caffe::CudnnNdPoolingLayer<>::Reshape()
@ 0x7f7c5b5f9a7d caffe::Layer<>::SetUp()
@ 0x7f7c5b5e6414 caffe::Net<>::Init()
@ 0x7f7c5b5e456d caffe::Net<>::Net()
@ 0x7f7c5b6bcb3f caffe::Solver<>::InitTrainNet()
@ 0x7f7c5b6bc362 caffe::Solver<>::Init()
@ 0x7f7c5b6bbe48 caffe::Solver<>::Solver()
@ 0x41ba11 caffe::SGDSolver<>::SGDSolver()
@ 0x419391 caffe::GetSolver<>()
@ 0x415053 train()
@ 0x417428 main
@ 0x7f7c5a332ec5 (unknown)
@ 0x413fa9 (unknown)
@ (nil) (unknown)
Aborted (core dumped)

I think my pooling layer causing above error. My cuda is 7.0, cudnn is v3.0, and I have Titan X so my setting should be okay or I might miss something such as path setting or other things. Do I miss something else? I tried to use "ReLU" in this promotion but I could not use it. Why can't I use layer type "ReLU" in this promotion? I could use it in #2442 promotion. Thanks!

ToruHironaka · 2015-12-10T19:01:50Z

@WGW101, I solved it by referencing @squall815 above but I still have problems about ReLU layer. Does this promotion support Nd-LRN?

Thanks!

ToruHironaka · 2016-01-07T22:52:25Z

@WGW101

I could train my hdf5 datasets with this promotion of caffe but my trainings have never completed so far. Accuracy = 0.5 or less and loss = 1.7 or above. I think my hdf5 datasets or network settings, were wrong. I posted my pyhton scripts for creating hdf dataset and my network setting below. Please help me out.

My python script, which convert images files into hdf5 dataset:

def image2HDF5(inputFile, outputDir, fileType, width, height, channel):

# initialize the total number of files 
# and input file list
numberOfFiles=0
inputFileList=[]
hdfFileList=[]
visualize=False

# open train or test file list with label 
with open(inputFile, 'r') as inputData:
    for fileName in inputData: 
        # this input file list includes label information as well
        inputFileList.append(fileName)  
        numberOfFiles = numberOfFiles + 1 

print "A number of files: ", numberOfFiles

# initialize index 
index=0
fileIndex=0
periodNum=100 # create hdf5 files every 100 file reading cycle

# this loop will open file from inputFileList one by one and put it into
# hdf data files
for dataFileName in inputFileList:

    if (fileIndex % periodNum) == 0:

        # open and create hdf5 file output directory for periodNum file cycle
        outputHDFFile = fileType + "-" + str(fileIndex) + ".h5"
        print "file name: " + outputHDFFile
        outputHDFPath = join(outputDir, outputHDFFile)
        print "hdf5 file: ", outputHDFPath
        fileOut = h5py.File(outputHDFPath, 'w')
        hdfFileList.append(outputHDFPath)


        # set data and label dimensions
        data = fileOut.create_dataset("data", (periodNum,channel,256,256), dtype=np.float32)
        label = fileOut.create_dataset("label", (periodNum,), dtype=np.float32)

        # image data matrix
        imageStack = np.empty((periodNum,channel,256,256)) # Create empty HxWxN array/matrix
        labelStack = np.empty((periodNum))
        # initialize index at every periodNum 
        index=0

    # parse file path and label info from file list line by line
    dataPathandLabel=dataFileName.split(' ', 1)
    dataFilePath=dataPathandLabel[0]
    # print(dataFilePath)
    dataLabel=dataPathandLabel[1]
    # print(dataLabel)
    lastSubDirName=dataFilePath.split('/')
    subDirName=lastSubDirName[-1]
    # print(subDirName)

    labelNumber=int(dataLabel)

    # load image:
    if channel == 1: 
        img=cv2.imread(dataFilePath, cv2.CV_LOAD_IMAGE_GRAYSCALE) # load grayscale
        print 'grayscale: ', img.shape

        imageStack[index,:,:,:]=img
        labelStack[...]=labelNumber

    elif channel == 3:
        img = cv2.imread(dataFilePath, cv2.CV_LOAD_IMAGE_COLOR) # color

        # check the first 5 image file 
        if index < 5 and visualize:
            plt.imshow(img)
            plt.show()

        img = img.transpose(2,1,0)
        # print 'RGB', img.shape

        imageStack[index,:,:,:]=img
        labelStack[...]=labelNumber

    index=index+1
    fileIndex=fileIndex+1

    if (fileIndex % periodNum) == 0:

        # load image data and label information to 
        # hdf5 file for each periodNum cycle
        data[...]=imageStack
        label[...]=labelStack

        # initialize data
        imageStack.__init__()

        # close the file for this cycle
        fileOut.close()
        print 'file close'


# list hdf5 train dataset file list
outputHDFListFile = fileType + '.txt'
outputHDFListPath = join(outputDir, outputHDFListFile)

if exists(outputHDFListPath): 
    outputHDFListFile = fileType + '-list.txt'
    outputHDFListPath = join(outputDir, outputHDFListFile)

print 'list: ', outputHDFListFile
print 'Output dir: ', outputHDFListPath

# hdef file list
with open(outputHDFListPath, 'w') as trainOut:
    for hdfFile in hdfFileList:
        print hdfFile
        writeOut=hdfFile + "\n"
        trainOut.write(writeOut)

Guillaume Bono added 11 commits July 7, 2015 17:29

Added Nd Convolution and Pooling layers using cuDNN

56a0292

Fix std::vector undefined reference + Various debug

995b97a

Added factories for Nd operations

8303fc0

Corrected missing REGISTER_LAYER_CLASS

60f35c4

Revert "Corrected missing REGISTER_LAYER_CLASS"

985c5a3

This reverts commit c5c8e17a9a604941faad46710e28ab4c8aa5602d.

Instantiate virtual forward and backward cpu (Not Implemented)

491bf8c

Fixed typo in class name

5edeb6b

Remove Blob dimension if equals to 1

33cf795

Made pad_shape and stride_shape optionnal, default to 0 and 1 respect…

dd02cbc

…ively

Corrected missing parentheses in output shape computation

6fd06af

Minor modification to debug usage of cudnn primitives

e03201a

Merge branch 'master' into nd-conv-pool

4d5ef35

Guillaume Bono added 2 commits September 14, 2015 15:18

Merge branch 'master' into nd-conv-pool

63b7880

Merge remote-tracking branch 'upstream/master' into nd-conv-pool

ea9cdaa

lfz mentioned this pull request Jan 5, 2016

nd convolution and pooling revision #3515

Open

christianpayer mentioned this pull request Apr 12, 2016

nd convolution and pooling with cuDNN #3983

Open

ghost approved these changes Nov 4, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nd conv pool #2824

Nd conv pool #2824

WGW101 commented Jul 27, 2015

Yeongtae commented Aug 3, 2015

WGW101 commented Aug 3, 2015

Yeongtae commented Aug 3, 2015

WGW101 commented Aug 3, 2015

Yeongtae commented Aug 3, 2015

WGW101 commented Aug 3, 2015

Yeongtae commented Aug 5, 2015

Yeongtae commented Aug 5, 2015

dhkim19e commented Sep 12, 2015

WGW101 commented Sep 14, 2015

rockstone533 commented Oct 6, 2015

Yeongtae commented Oct 6, 2015

rockstone533 commented Oct 6, 2015

Yeongtae commented Oct 6, 2015

Yeongtae commented Oct 6, 2015

rockstone533 commented Oct 6, 2015

rockstone533 commented Oct 6, 2015

WGW101 commented Oct 7, 2015

rockstone533 commented Oct 8, 2015

ToruHironaka commented Dec 8, 2015

WGW101 commented Dec 9, 2015

ToruHironaka commented Dec 10, 2015

ToruHironaka commented Dec 10, 2015

ToruHironaka commented Jan 7, 2016

Nd conv pool #2824

Are you sure you want to change the base?

Nd conv pool #2824

Conversation

WGW101 commented Jul 27, 2015

Yeongtae commented Aug 3, 2015

WGW101 commented Aug 3, 2015

Yeongtae commented Aug 3, 2015

WGW101 commented Aug 3, 2015

Yeongtae commented Aug 3, 2015

WGW101 commented Aug 3, 2015

Yeongtae commented Aug 5, 2015

Yeongtae commented Aug 5, 2015

dhkim19e commented Sep 12, 2015

WGW101 commented Sep 14, 2015

rockstone533 commented Oct 6, 2015

Yeongtae commented Oct 6, 2015

rockstone533 commented Oct 6, 2015

Yeongtae commented Oct 6, 2015

Yeongtae commented Oct 6, 2015

rockstone533 commented Oct 6, 2015

rockstone533 commented Oct 6, 2015

WGW101 commented Oct 7, 2015

rockstone533 commented Oct 8, 2015

ToruHironaka commented Dec 8, 2015

WGW101 commented Dec 9, 2015

ToruHironaka commented Dec 10, 2015

ToruHironaka commented Dec 10, 2015

ToruHironaka commented Jan 7, 2016