Add large tensor nightly tests for MKL-DNN operators #16184

wuxun-zhang · 2019-09-17T01:18:53Z

To track the correctness of MKL-DNN operators when switching to use int64 tensor size, we added more nightly tests into the original script. During these changes, we tried to test more scenarios including different data types (float32/int64) and so on.

@pengzhao-intel @TaoLv @apeforest @ChaiBapchya

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

marcoabreu · 2019-09-18T22:56:30Z

Can't we reuse the already existing large tensor tests? From a front-end perspective, it shouldn't matter to the user which backend is being used, right?

I understand that in the backend MKLDNN only supports float32 as input for now, but how about we hide that fact and instead do some magic in the backend in the meantime? IMO, the user-code should not be backend specific and thus we should use the same tests to enforce that constraint.

I'd like to stay away from backend specific tests as much as possible, so I'd appreciate it if we could rather work towards improving these intermediary layers instead of adding a second set of tests.

pengzhao-intel · 2019-09-19T02:03:16Z

Can't we reuse the already existing large tensor tests? From a front-end perspective, it shouldn't matter to the user which backend is being used, right?

I understand that in the backend MKLDNN only supports float32 as input for now, but how about we hide that fact and instead do some magic in the backend in the meantime? IMO, the user-code should not be backend specific and thus we should use the same tests to enforce that constraint.

I'd like to stay away from backend specific tests as much as possible, so I'd appreciate it if we could rather work towards improving these intermediary layers instead of adding a second set of tests.

@marcoabreu Good point! I think these codes are backend independent :)
@wuxun-zhang you can try to remove the MKL-DNN in the file name and run it with different backends.

wuxun-zhang · 2019-09-26T02:53:27Z

Sorry for the delay reply. Maybe I can put such tests for float32 data type into the original script instead of creating a new script. Then we can use this script to evaluate large tensor support in two steps: int64 tensor size+float32 data type, int64 tensor size+int64 data type. What do you think of this solution? @marcoabreu @pengzhao-intel

marcoabreu · 2019-09-26T08:56:57Z

I'll leave the details to you :) as long as it's backend independent and reused across the different ones, I'm fine with it.

Feel free to hit me up once you made the changes to verify them.

wuxun-zhang · 2019-09-26T13:43:27Z

@marcoabreu Please take a review again and see if such changes resolve your concerns. Thanks.

wuxun-zhang · 2019-10-08T01:26:13Z

@apeforest @ChaiBapchya Could you please take a look at this PR? Thanks.

marcoabreu · 2019-10-08T10:07:49Z

tests/nightly/test_large_array.py

+ nd.waitall()
+ return mean, stdvar
+
+ shape = (MEDIUM_X, MEDIUM_X, SMALL_Y, SMALL_Y)


You are changing the test by getting rid of LARGE_X, is that on purpose?

Thanks. Forgot to change this. At least we need to make sure shape[2]*shape[3] is beyond the index range of int32.

ChaiBapchya

@wuxun-zhang Thanks for this PR. Apologies for delayed review. Can you rebase with latest master? Left a few questions too.
Thanks once again!

ChaiBapchya · 2019-10-28T17:53:52Z

tests/nightly/test_large_array.py

+ res = nd.FullyConnected(a, b, num_hidden=b.shape[0], no_bias=True)
+ assert np.sum(res[-1].asnumpy() == a.shape[1]) == b.shape[0]
+
+ res = nd.FullyConnected(a, b, c, num_hidden=b.shape[0], no_bias=False)


what's the intuition behind adding this?

Just want to cover w/ bias and w/o bias

ChaiBapchya · 2019-10-28T17:55:13Z

tests/nightly/test_large_array.py

+ nd.waitall()
+ return mean, stdvar
+
+ shape = (3, 3, LARGE_X, SMALL_Y)


why do we need to change shape?

Just want to cover MKL-DNN batch-norm op, which only supports ndim=4 now.

ChaiBapchya · 2019-10-28T17:55:59Z

tests/nightly/test_large_array.py

@@ -952,11 +1001,11 @@ def test_reshape_like():


 def test_flatten():
- a = create_2d_tensor(rows=LARGE_X, columns=SMALL_Y).reshape((LARGE_X//2, 2, SMALL_Y))
- b = nd.flatten(a)
- assert b[-1][-1] == (LARGE_X-1)


why are we removing these asserts?

It is related to the different precision of int64 and float32. This function create_2d_tensor will generate a NDArray with the elements varies from 0 to LARGE_X-1. For float32, it will lose some precision when LARGE_X is too large, that is LARGE_X - 1 or LARGE_X - 2 can not represent the accurate value. This is different with int64. So I think these asserts can be removed.

ChaiBapchya

LGTM! Would have been great had those comments were inlined in code (for clarity) But thanks for addressing nonetheless.

wuxun-zhang · 2019-11-05T01:21:24Z

@ChaiBapchya @marcoabreu Please take a look again and see if your concerns are properly resolved. Thanks.

apeforest · 2019-11-06T05:29:55Z

tests/nightly/test_large_array.py

+ args_grad=None)
+ ex.forward(is_train=False)
+ softmax_out = ex.outputs[0][0].asnumpy()
+ expected_softmax_out = (1/SMALL_Y)*mx.nd.ones((SMALL_Y)).asnumpy()


nit: add space between operators and operands

apeforest · 2019-11-06T05:31:37Z

tests/nightly/test_large_array.py

@@ -782,8 +801,30 @@ def test_activation():
 # in future, we could test if mean, var of output
 # matches target output's mean, var
 def test_batchnorm():
- shape = (LARGE_X, SMALL_Y)
+ def get_ref_mean_var(data, running_mean, running_var, eps, use_global_status=True):


Why use ndarray to get_ref_mean_var? Wouldn't it be simpler and more correct to just implement it using numpy?

+1 for numpy instead of nd and function could be get_np_mean_var rather than ref (i'm guessing ref mean't reference but not quite clear)

apeforest · 2019-11-06T05:33:00Z

tests/nightly/test_large_array.py

+ # Here we removed the value asserts due to different precision of `int64` and `float32`.
+ # For `float32`, it will lose some precision when `LARGE_X` is too large, that is `LARGE_X-1`
+ # and `LARGE_X-2` can not represent the accurate value in the current situation.
+ assert b.shape == (LARGE_X//2, SMALL_Y*2)


Can we also test one of the values inside tensor b?

apeforest · 2019-11-06T05:33:51Z

Since the currently nightly test is broken, could you please run all the tests offline and paste your output to this PR? Thanks!

apeforest

Please run all the tests with mkldnn on and paste your output in this PR.

wuxun-zhang · 2019-11-07T10:58:35Z

Output log of all mkl-dnn operators (with export MKLDNN_VERBOSE=1 ):

test_large_array_mkldnn.test_FullyConnected ... mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,inner_product,gemm:jit,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,mb100000000ic43oc43,559.605
mkldnn_verbose,exec,cpu,inner_product,gemm:jit,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_f32::blocked:a:f0 dst_f32::blocked:ab:f0,,mb100000000ic43oc43,760.226
ok
test_large_array_mkldnn.test_pooling ... mkldnn_verbose,exec,cpu,pooling,simple_nchw:any,forward_inference,src_f32::blocked:abcd:f0 dst_f32::blocked:abcd:f0 ws_undef::undef::f0,alg:pooling_avg_include_padding,mb10000ic200_ih43oh39kh5sh1ph0_iw43ow39kw5sw1pw0,3277.87
mkldnn_verbose,exec,cpu,pooling,simple_nchw:any,forward_inference,src_f32::blocked:abcd:f0 dst_f32::blocked:abcd:f0 ws_undef::undef::f0,alg:pooling_max,mb10000ic200_ih43oh39kh5sh1ph0_iw43ow39kw5sw1pw0,10394.7
ok
test_large_array_mkldnn.test_activation ... mkldnn_verbose,exec,cpu,eltwise,jit:avx512_common,forward_inference,data_f32::blocked:ab:f0 diff_undef::undef::f0,alg:eltwise_tanh:0:0,100000000x43,389.182
mkldnn_verbose,exec,cpu,eltwise,jit:avx512_common,forward_inference,data_f32::blocked:ab:f0 diff_undef::undef::f0,alg:eltwise_relu:0:0,100000000x43,352.823
mkldnn_verbose,exec,cpu,eltwise,jit:avx512_common,forward_inference,data_f32::blocked:ab:f0 diff_undef::undef::f0,alg:eltwise_logistic:0:0,100000000x43,363.188
ok
test_large_array_mkldnn.test_transpose ... mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:ab:f0 dst_f32::blocked:ba:f0,num:1,100000000x43,4733.55
ok
test_large_array_mkldnn.test_softmax ... mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,softmax,jit:avx512_common,forward_inference,data_f32::blocked:ab:f0 diff_undef::undef::f0,axis:1,43x100000000,917.137
ok
test_large_array_mkldnn.testSoftmaxOutput ... [18:22:25] src/executor/graph_executor.cc:2057: Subgraph backend MKLDNN is activated.
mkldnn_verbose,exec,cpu,softmax,jit:avx512_common,forward_inference,data_f32::blocked:ab:f0 diff_undef::undef::f0,axis:1,100000000x43,578.566
[18:22:27] src/executor/graph_executor.cc:2057: Subgraph backend MKLDNN is activated.
ok
test_large_array_mkldnn.test_batchnorm ... [18:26:33] src/operator/nn/./mkldnn/mkldnn_batch_norm-inl.h:204: inference mode...
mkldnn_verbose,exec,cpu,batch_normalization,ncsp_bnorm:any,forward_inference,data_f32::blocked:abcd:f0 diff_undef::undef::f0,flags:3,mb1ic1ih100000000iw43,861.505
ok
test_large_array_mkldnn.test_flatten ... mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32::blocked:abc:f0 dst_f32::blocked:abc:f0,num:1,50000000x2x43,346.131
ok
test_large_array_mkldnn.test_concat ... mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,concat,simple:any,undef,src_f32::blocked:ab:f0 src_f32::blocked:ab:f0 dst_f32::blocked:ab:f0,num:2,43x200000000,1070.45
ok
Helper function that cleans up memory by releasing it from memory pool ... ok

----------------------------------------------------------------------
Ran 2 tests in 24.734s
OK
test_large_array.test_expand_dims ... mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32::blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,100000000x43,398.925
ok
----------------------------------------------------------------------
Ran 1 test in 0.921s

OK
test_large_array_mkldnn.test_elemwise_add ... mkldnn_verbose,exec,cpu,sum,simple:any,undef,src_f32::blocked:ab:f0 src_f32::blocked:ab:f0 dst_f32::blocked:ab:f0,num:2,100000000x43,517.932
ok
test_large_array_mkldnn.test_slice ... mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x42,0.0600586
ok
Helper function that cleans up memory by releasing it from memory pool ... ok
OK
----------------------------------------------------------------------
Ran 2 tests in 48.266s

OK

pengzhao-intel · 2019-11-19T02:33:37Z

@apeforest @marcoabreu please take a review for this PR again.
We'd better close this one soon since our people are moving faster for the new features :)

apeforest

LGTM. Sorry for the delay.

pengzhao-intel added the MKLDNN label Sep 17, 2019

pengzhao-intel added this to In progress in CPU Performance and Quantization via automation Sep 17, 2019

wuxun-zhang force-pushed the add_mkldnn_lts_ut branch from f3594c6 to f3795e8 Compare September 26, 2019 08:55

wuxun-zhang force-pushed the add_mkldnn_lts_ut branch from f3795e8 to fd4dab4 Compare September 26, 2019 08:58

marcoabreu suggested changes Oct 8, 2019

View reviewed changes

CPU Performance and Quantization automation moved this from In progress to Review in progress Oct 8, 2019

wuxun-zhang force-pushed the add_mkldnn_lts_ut branch 2 times, most recently from f0c3497 to 620bb37 Compare October 15, 2019 05:37

ChaiBapchya reviewed Oct 28, 2019

View reviewed changes

wuxun-zhang force-pushed the add_mkldnn_lts_ut branch from 620bb37 to 1a36cc8 Compare November 1, 2019 04:41

ChaiBapchya approved these changes Nov 1, 2019

View reviewed changes

wuxun-zhang force-pushed the add_mkldnn_lts_ut branch from 1a36cc8 to c1e1e96 Compare November 5, 2019 01:19

apeforest reviewed Nov 6, 2019

View reviewed changes

apeforest suggested changes Nov 6, 2019

View reviewed changes

wuxun-zhang added 5 commits November 11, 2019 09:35

merge ut into original script

8bc2b38

use LARGE_X to define the input shape

81632a1

add inline comments

945fe3b

address comments

63e069f

rebase code

42d2ba9

wuxun-zhang force-pushed the add_mkldnn_lts_ut branch from 19107de to 42d2ba9 Compare November 11, 2019 01:37

marcoabreu approved these changes Nov 19, 2019

View reviewed changes

marcoabreu merged commit 4f14bf4 into apache:master Nov 19, 2019

CPU Performance and Quantization automation moved this from Review in progress to Done Nov 19, 2019

wuxun-zhang deleted the add_mkldnn_lts_ut branch November 19, 2019 10:19

apeforest reviewed Nov 19, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add large tensor nightly tests for MKL-DNN operators #16184

Add large tensor nightly tests for MKL-DNN operators #16184

wuxun-zhang commented Sep 17, 2019 •

edited

Loading

marcoabreu commented Sep 18, 2019 •

edited

Loading

pengzhao-intel commented Sep 19, 2019

wuxun-zhang commented Sep 26, 2019

marcoabreu commented Sep 26, 2019

wuxun-zhang commented Sep 26, 2019

wuxun-zhang commented Oct 8, 2019

marcoabreu Oct 8, 2019

wuxun-zhang Oct 8, 2019

ChaiBapchya left a comment

ChaiBapchya Oct 28, 2019

wuxun-zhang Nov 1, 2019

ChaiBapchya Oct 28, 2019

wuxun-zhang Nov 1, 2019

ChaiBapchya Oct 28, 2019

wuxun-zhang Nov 1, 2019

ChaiBapchya left a comment •

edited

Loading

wuxun-zhang commented Nov 5, 2019

apeforest Nov 6, 2019

apeforest Nov 6, 2019

ChaiBapchya Nov 6, 2019

wuxun-zhang Nov 7, 2019

apeforest Nov 6, 2019

wuxun-zhang Nov 7, 2019

apeforest commented Nov 6, 2019

apeforest left a comment

wuxun-zhang commented Nov 7, 2019

pengzhao-intel commented Nov 19, 2019

apeforest left a comment

Add large tensor nightly tests for MKL-DNN operators #16184

Add large tensor nightly tests for MKL-DNN operators #16184

Conversation

wuxun-zhang commented Sep 17, 2019 • edited Loading

Checklist

Essentials

Changes

Comments

marcoabreu commented Sep 18, 2019 • edited Loading

pengzhao-intel commented Sep 19, 2019

wuxun-zhang commented Sep 26, 2019

marcoabreu commented Sep 26, 2019

wuxun-zhang commented Sep 26, 2019

wuxun-zhang commented Oct 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChaiBapchya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChaiBapchya left a comment • edited Loading

Choose a reason for hiding this comment

wuxun-zhang commented Nov 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apeforest commented Nov 6, 2019

apeforest left a comment

Choose a reason for hiding this comment

wuxun-zhang commented Nov 7, 2019

pengzhao-intel commented Nov 19, 2019

apeforest left a comment

Choose a reason for hiding this comment

wuxun-zhang commented Sep 17, 2019 •

edited

Loading

marcoabreu commented Sep 18, 2019 •

edited

Loading

ChaiBapchya left a comment •

edited

Loading