Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MKL-DNN] Integrate Conv3d and Pool3d/1d #17884

Merged
merged 4 commits into from
Apr 9, 2020

Conversation

wuxun-zhang
Copy link
Contributor

Description

This PR aims to integrate mkl-dnn 3d conv/pool primitive, including fp32 and quantization pass.
@pengzhao-intel @TaoLv @ciyongch @rongzha1 @zixuanweeei

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)

Changes

  • support 5d data layout for MKLDNN

return (dtype == mshadow::kFloat32 || dtype == mshadow::kBfloat16) &&
(ndim == 1 || ndim == 2 || ndim == 4);
(ndim >= 1 && ndim <= 5);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the indent and make sure you also want to enable ndim=3 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better not to touch this function. I will enable 5d layout in SupportPooling instead.

Comment on lines 330 to 331
const int D = (ndim == 5) ? 2 : 1;
const int N = 0, C = 1, H = D + 1, W = D + 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more descriptive here:

Suggested change
const int D = (ndim == 5) ? 2 : 1;
const int N = 0, C = 1, H = D + 1, W = D + 2;
int N = 0, C = 1, H = 2, W = 3;
int D = -1;
if (ndim == 5) {
D = 2;
H = 3;
W = 4;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

param.pad[2], param.pad[2], param.kernel[2], param.stride[2]));
case 4:
is_symmetric = is_symmetric && (param.pad[1] == GetPaddingSizeFull(dshape[3],
param.pad[1], param.pad[1], param.kernel[1], param.stride[1]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see both pad[0] and pad[1] are checked in previous code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please show me where you saw these checks? Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, i didn't realize that you don't have break for each case. So if ndim == 4, both pad[1] and pad[0] are checked.

@@ -127,61 +116,139 @@ mkldnn::algorithm GetMKLDNNPoolAlgo(const PoolingParam &param) {
}
}

void InitPoolingPrimitiveParams(const PoolingParam &param,
const mkldnn::memory::desc &data_md,
mkldnn::memory::dims *new_kernel,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about pass-by-reference?

param.pad[2], param.pad[2], param.kernel[2], param.stride[2]));
case 4:
is_symmetric = is_symmetric && (param.pad[1] == GetPaddingSizeFull(dshape[3],
param.pad[1], param.pad[1], param.kernel[1], param.stride[1]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, i didn't realize that you don't have break for each case. So if ndim == 4, both pad[1] and pad[0] are checked.

@@ -274,12 +274,12 @@ void PoolingComputeExCPU(const nnvm::NodeAttrs &attrs, const OpContext &ctx,

// Pooling does not currently support working with views
if (inputs[0].IsView() || outputs[0].IsView()) {
std::cout << "Fall back to Pooling backward pass..." << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended? We didn't have any log message for fallback execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Forgot to remove this line. Will fix.

@wuxun-zhang
Copy link
Contributor Author

@mxnet-bot run ci [centos-cpu, unix-gpu, windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu, unix-gpu, centos-cpu]

@TaoLv TaoLv mentioned this pull request Mar 26, 2020
@pengzhao-intel pengzhao-intel added this to In progress in CPU Performance and Quantization via automation Mar 27, 2020
@pengzhao-intel
Copy link
Contributor

@wuxun-zhang please take a look for CI

@wuxun-zhang
Copy link
Contributor Author

Seems CI system is now experiencing failures. The errors are unrelated to this PR.

@TaoLv
Copy link
Member

TaoLv commented Apr 4, 2020

The CI issue should be already addressed. Please rebase your PR and resolve the comments. Thanks.

@wuxun-zhang
Copy link
Contributor Author

Now DNNL version in master branch has been wrongly changed (from v1.2.2 to v1.1.2). This PR is now waiting for DNNL v1.2 or higher version coming back.

@ChaiBapchya
Copy link
Contributor

PR #17084 updated 3rdparty/mkldnn from 8e96ef to cb2cc7 [1.1.0 to 1.1.2]
@wuxun-zhang Can you confirm when was it v1.2.2? and what's the path forward?

@wuxun-zhang
Copy link
Contributor Author

Thank you @ChaiBapchya for investigating this. 8e96ef should be OneDNN v1.2.2. Actually we are working on confirming with the PR author of #17084 if such downgrade is intended or a mistake.

@wuxun-zhang
Copy link
Contributor Author

Now CI has finally passed. @pengzhao-intel @TaoLv please help review again.
@ChaiBapchya please also double check if this PR fixes #17915.

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ciyongch @TaoLv please take a review too

CPU Performance and Quantization automation moved this from In progress to Reviewer approved Apr 8, 2020
@ChaiBapchya
Copy link
Contributor

@wuxun-zhang Thanks a lot for the work.
Can you share perf numbers for this patch?
What's the time taken 3d conv now w/MKLDNN vs then w/o MKLDNN?
Can we profile it?

return (dtype == mshadow::kFloat32 || dtype == mshadow::kBfloat16) &&
(ndim == 1 || ndim == 2 || ndim == 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue with 3D tensor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This is not needed for this PR. We have already enabled mkldnn conv with 3D tensor previously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you point me to where it's handled? I didn't understand the separate treatment of 3D

@pengzhao-intel
Copy link
Contributor

Thanks for the improvement. Merging now.

We will share the performance data soon.

@pengzhao-intel pengzhao-intel merged commit 664889a into apache:master Apr 9, 2020
CPU Performance and Quantization automation moved this from Reviewer approved to Done Apr 9, 2020
wuxun-zhang added a commit to wuxun-zhang/incubator-mxnet that referenced this pull request Apr 15, 2020
* Integrate MKl-DNN conv3d and pool3d/1d

* fix UT & address comments

* clean code

* rebase against latest master
@wuxun-zhang wuxun-zhang deleted the 3d_conv_pool branch April 15, 2020 01:54
wuxun-zhang added a commit to wuxun-zhang/incubator-mxnet that referenced this pull request Apr 16, 2020
* Integrate MKl-DNN conv3d and pool3d/1d

* fix UT & address comments

* clean code

* rebase against latest master
wuxun-zhang added a commit to wuxun-zhang/incubator-mxnet that referenced this pull request Apr 16, 2020
* Integrate MKl-DNN conv3d and pool3d/1d

* fix UT & address comments

* clean code

* rebase against latest master
wuxun-zhang added a commit to wuxun-zhang/incubator-mxnet that referenced this pull request Apr 18, 2020
* Integrate MKl-DNN conv3d and pool3d/1d

* fix UT & address comments

* clean code

* rebase against latest master
pengzhao-intel pushed a commit that referenced this pull request Apr 18, 2020
* [MKLDNN] apply MKLDNNRun to quantized_act/transpose (#17689)

* apply MKLDNNRun to quantized_act/transpose ops

* run CI

* [MKL-DNN] Integrate Conv3d and Pool3d/1d (#17884)

* Integrate MKl-DNN conv3d and pool3d/1d

* fix UT & address comments

* clean code

* rebase against latest master

* fix conflicts

* fix CI

* rebase
@ChaiBapchya
Copy link
Contributor

@pengzhao-intel @wuxun-zhang Gentle ping. Can we get the perf numbers after this PR changes?

@wuxun-zhang
Copy link
Contributor Author

Performance numbers for Conv3d op:

shape w/o mkldnn w/mkldnn
(3, 3, 16, 224, 224) 3.696679 sec 0.046571 sec
(3, 3, 128, 128, 128) 11.716535 sec 0.165749 sec

Test script:

import mxnet as mx
from mxnet import nd, gluon
import time

data_shape = [(3, 3, 16, 224, 224), (3, 3, 128, 128, 128)]

for shape in data_shape:
	data = mx.random.uniform(shape=shape)
	weight_shape = (32, shape[1], 3, 3, 3)
	weight = mx.nd.ones(shape=weight_shape)

	num_iter = 10
	dry_run = 5
	for i in range(num_iter):
		if i == dry_run:
			tic = time.time()
		out = mx.nd.Convolution(data, weight, kernel=(3,3,3), stride=(1,1,1), num_filter=weight_shape[0], pad=(0,0,0), dilate=(2,2,2), no_bias=True, cudnn_off=True, name='conv3d')
		out.asnumpy()
	print("For shape : {}".format(shape))
	print("Average time cost is %f sec" % ((time.time() - tic)/(num_iter-dry_run))

@bartekkuncer
Copy link
Contributor

Some more performance numbers for Conv3d:

Shape Kernel shape w/o mkldnn w/mkldnn
(3, 3, 12, 128, 64) (2, 2, 2) 0.56526 sec 0.00332 sec
(3, 3, 12, 128, 128) (2, 2, 2) 1.06378 sec 0.00453 sec
(3, 3, 128, 128, 128) (2, 2, 2) 10.53919 sec 0.04117 sec
(5, 5, 12, 128, 128) (2, 2, 2) 2.76165 sec 0.00688 sec
(5, 5, 128, 128, 128) (2, 2, 2) 29.06880 sec 0.07062 sec
(3, 3, 12, 128, 64) (3, 3, 3) 1.48476 sec 0.00172 sec
(3, 3, 12, 128, 128) (3, 3, 3) 2.87235 sec 0.00424 sec
(3, 3, 128, 128, 128) (3, 3, 3) 34.26913 sec 0.04124 sec
(5, 5, 12, 128, 128) (3, 3, 3) 7.77486 sec 0.00697 sec
(5, 5, 128, 128, 128) (3, 3, 3) 141.04758 sec 0.07259 sec

Test I used:

import mxnet as mx
import time

def perf_conv3d():
    shapes = [(3, 3, 12, 128, 64), (3, 3, 12, 128, 128), (3, 3, 128, 128, 128), (5, 5, 16, 128, 128), (5, 5, 128, 128, 128)]
    kernel_shapes = [(2, 2, 2), (3, 3, 3)]
    num_filter = 32

    warmup = 10
    runs = 40

    for shape in shapes:
        for kernel_shape in kernel_shapes:
            a = mx.random.uniform(shape=shape)
            w = mx.random.uniform(shape=(num_filter, shape[1], kernel_shape[0], kernel_shape[1], kernel_shape[2]))

            tic = 0
            for i in range(runs + warmup):
                if i == warmup:
                    tic = time.time()
                _ = mx.nd.Convolution(data=a, weight=w, kernel=kernel_shape, num_filter=num_filter, stride=(1,1,1), pad=(0,0,0), no_bias=True, cudnn_off=True)
                mx.nd.waitall()

            toc = time.time()
            print('conv3d benchmark, shape={}, kernel_shape={} time={} ms/iter'.format(shape, kernel_shape, (toc-tic)*1000/(runs-warmup)))


if __name__ == '__main__':
    perf_conv3d()

Sorry for the delay.

ChaiBapchya pushed a commit to ChaiBapchya/mxnet that referenced this pull request Apr 22, 2020
* Integrate MKl-DNN conv3d and pool3d/1d

* fix UT & address comments

* clean code

* rebase against latest master
ChaiBapchya pushed a commit to ChaiBapchya/mxnet that referenced this pull request Apr 30, 2020
* Integrate MKl-DNN conv3d and pool3d/1d

* fix UT & address comments

* clean code

* rebase against latest master
leezu pushed a commit that referenced this pull request May 6, 2020
…nd Fix Sanity pipeline in 1.6.x (#18206)

* [MKL-DNN] Integrate Conv3d and Pool3d/1d (#17884)

* Integrate MKl-DNN conv3d and pool3d/1d

* fix UT & address comments

* clean code

* rebase against latest master

* pylint astroid sanity issue

* astroid and pylint versions only supported in py3

* remove kBFloat16 as its not supported int 1.6

* added missing definition GetPaddingSizeFull

* Remove dilation restriction for conv3d (#17491)

* Remove conv3d dilation restriction

* Remove comment

* fix unix-gpu test for num_outputs and inputs

Co-authored-by: Wuxun Zhang <[email protected]>
Co-authored-by: reminisce <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants