Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add large tensor nightly tests for MKL-DNN operators #16184

Merged
merged 5 commits into from
Nov 19, 2019

Conversation

wuxun-zhang
Copy link
Contributor

@wuxun-zhang wuxun-zhang commented Sep 17, 2019

To track the correctness of MKL-DNN operators when switching to use int64 tensor size, we added more nightly tests into the original script. During these changes, we tried to test more scenarios including different data types (float32/int64) and so on.

@pengzhao-intel @TaoLv @apeforest @ChaiBapchya

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@pengzhao-intel pengzhao-intel added this to In progress in CPU Performance and Quantization via automation Sep 17, 2019
@marcoabreu
Copy link
Contributor

marcoabreu commented Sep 18, 2019

Can't we reuse the already existing large tensor tests? From a front-end perspective, it shouldn't matter to the user which backend is being used, right?

I understand that in the backend MKLDNN only supports float32 as input for now, but how about we hide that fact and instead do some magic in the backend in the meantime? IMO, the user-code should not be backend specific and thus we should use the same tests to enforce that constraint.

I'd like to stay away from backend specific tests as much as possible, so I'd appreciate it if we could rather work towards improving these intermediary layers instead of adding a second set of tests.

@pengzhao-intel
Copy link
Contributor

Can't we reuse the already existing large tensor tests? From a front-end perspective, it shouldn't matter to the user which backend is being used, right?

I understand that in the backend MKLDNN only supports float32 as input for now, but how about we hide that fact and instead do some magic in the backend in the meantime? IMO, the user-code should not be backend specific and thus we should use the same tests to enforce that constraint.

I'd like to stay away from backend specific tests as much as possible, so I'd appreciate it if we could rather work towards improving these intermediary layers instead of adding a second set of tests.

@marcoabreu Good point! I think these codes are backend independent :)
@wuxun-zhang you can try to remove the MKL-DNN in the file name and run it with different backends.

@wuxun-zhang
Copy link
Contributor Author

Sorry for the delay reply. Maybe I can put such tests for float32 data type into the original script instead of creating a new script. Then we can use this script to evaluate large tensor support in two steps: int64 tensor size+float32 data type, int64 tensor size+int64 data type. What do you think of this solution? @marcoabreu @pengzhao-intel

@marcoabreu
Copy link
Contributor

I'll leave the details to you :) as long as it's backend independent and reused across the different ones, I'm fine with it.

Feel free to hit me up once you made the changes to verify them.

@wuxun-zhang
Copy link
Contributor Author

@marcoabreu Please take a review again and see if such changes resolve your concerns. Thanks.

@wuxun-zhang
Copy link
Contributor Author

@apeforest @ChaiBapchya Could you please take a look at this PR? Thanks.

nd.waitall()
return mean, stdvar

shape = (MEDIUM_X, MEDIUM_X, SMALL_Y, SMALL_Y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are changing the test by getting rid of LARGE_X, is that on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Forgot to change this. At least we need to make sure shape[2]*shape[3] is beyond the index range of int32.

CPU Performance and Quantization automation moved this from In progress to Review in progress Oct 8, 2019
@wuxun-zhang wuxun-zhang force-pushed the add_mkldnn_lts_ut branch 2 times, most recently from f0c3497 to 620bb37 Compare October 15, 2019 05:37
Copy link
Contributor

@ChaiBapchya ChaiBapchya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuxun-zhang Thanks for this PR. Apologies for delayed review. Can you rebase with latest master? Left a few questions too.
Thanks once again!

res = nd.FullyConnected(a, b, num_hidden=b.shape[0], no_bias=True)
assert np.sum(res[-1].asnumpy() == a.shape[1]) == b.shape[0]

res = nd.FullyConnected(a, b, c, num_hidden=b.shape[0], no_bias=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the intuition behind adding this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to cover w/ bias and w/o bias

nd.waitall()
return mean, stdvar

shape = (3, 3, LARGE_X, SMALL_Y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to change shape?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to cover MKL-DNN batch-norm op, which only supports ndim=4 now.

@@ -952,11 +1001,11 @@ def test_reshape_like():


def test_flatten():
a = create_2d_tensor(rows=LARGE_X, columns=SMALL_Y).reshape((LARGE_X//2, 2, SMALL_Y))
b = nd.flatten(a)
assert b[-1][-1] == (LARGE_X-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we removing these asserts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is related to the different precision of int64 and float32. This function create_2d_tensor will generate a NDArray with the elements varies from 0 to LARGE_X-1. For float32, it will lose some precision when LARGE_X is too large, that is LARGE_X - 1 or LARGE_X - 2 can not represent the accurate value. This is different with int64. So I think these asserts can be removed.

Copy link
Contributor

@ChaiBapchya ChaiBapchya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Would have been great had those comments were inlined in code (for clarity) But thanks for addressing nonetheless.

@wuxun-zhang
Copy link
Contributor Author

@ChaiBapchya @marcoabreu Please take a look again and see if your concerns are properly resolved. Thanks.

args_grad=None)
ex.forward(is_train=False)
softmax_out = ex.outputs[0][0].asnumpy()
expected_softmax_out = (1/SMALL_Y)*mx.nd.ones((SMALL_Y)).asnumpy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add space between operators and operands

@@ -782,8 +801,30 @@ def test_activation():
# in future, we could test if mean, var of output
# matches target output's mean, var
def test_batchnorm():
shape = (LARGE_X, SMALL_Y)
def get_ref_mean_var(data, running_mean, running_var, eps, use_global_status=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use ndarray to get_ref_mean_var? Wouldn't it be simpler and more correct to just implement it using numpy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for numpy instead of nd and function could be get_np_mean_var rather than ref (i'm guessing ref mean't reference but not quite clear)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# Here we removed the value asserts due to different precision of `int64` and `float32`.
# For `float32`, it will lose some precision when `LARGE_X` is too large, that is `LARGE_X-1`
# and `LARGE_X-2` can not represent the accurate value in the current situation.
assert b.shape == (LARGE_X//2, SMALL_Y*2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also test one of the values inside tensor b?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@apeforest
Copy link
Contributor

Since the currently nightly test is broken, could you please run all the tests offline and paste your output to this PR? Thanks!

Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run all the tests with mkldnn on and paste your output in this PR.

@wuxun-zhang
Copy link
Contributor Author

Output log of all mkl-dnn operators (with export MKLDNN_VERBOSE=1 ):

test_large_array_mkldnn.test_FullyConnected ... mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,inner_product,gemm:jit,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,mb100000000ic43oc43,559.605
mkldnn_verbose,exec,cpu,inner_product,gemm:jit,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_f32::blocked:a:f0 dst_f32::blocked:ab:f0,,mb100000000ic43oc43,760.226
ok
test_large_array_mkldnn.test_pooling ... mkldnn_verbose,exec,cpu,pooling,simple_nchw:any,forward_inference,src_f32::blocked:abcd:f0 dst_f32::blocked:abcd:f0 ws_undef::undef::f0,alg:pooling_avg_include_padding,mb10000ic200_ih43oh39kh5sh1ph0_iw43ow39kw5sw1pw0,3277.87
mkldnn_verbose,exec,cpu,pooling,simple_nchw:any,forward_inference,src_f32::blocked:abcd:f0 dst_f32::blocked:abcd:f0 ws_undef::undef::f0,alg:pooling_max,mb10000ic200_ih43oh39kh5sh1ph0_iw43ow39kw5sw1pw0,10394.7
ok
test_large_array_mkldnn.test_activation ... mkldnn_verbose,exec,cpu,eltwise,jit:avx512_common,forward_inference,data_f32::blocked:ab:f0 diff_undef::undef::f0,alg:eltwise_tanh:0:0,100000000x43,389.182
mkldnn_verbose,exec,cpu,eltwise,jit:avx512_common,forward_inference,data_f32::blocked:ab:f0 diff_undef::undef::f0,alg:eltwise_relu:0:0,100000000x43,352.823
mkldnn_verbose,exec,cpu,eltwise,jit:avx512_common,forward_inference,data_f32::blocked:ab:f0 diff_undef::undef::f0,alg:eltwise_logistic:0:0,100000000x43,363.188
ok
test_large_array_mkldnn.test_transpose ... mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:ab:f0 dst_f32::blocked:ba:f0,num:1,100000000x43,4733.55
ok
test_large_array_mkldnn.test_softmax ... mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,softmax,jit:avx512_common,forward_inference,data_f32::blocked:ab:f0 diff_undef::undef::f0,axis:1,43x100000000,917.137
ok
test_large_array_mkldnn.testSoftmaxOutput ... [18:22:25] src/executor/graph_executor.cc:2057: Subgraph backend MKLDNN is activated.
mkldnn_verbose,exec,cpu,softmax,jit:avx512_common,forward_inference,data_f32::blocked:ab:f0 diff_undef::undef::f0,axis:1,100000000x43,578.566
[18:22:27] src/executor/graph_executor.cc:2057: Subgraph backend MKLDNN is activated.
ok
test_large_array_mkldnn.test_batchnorm ... [18:26:33] src/operator/nn/./mkldnn/mkldnn_batch_norm-inl.h:204: inference mode...
mkldnn_verbose,exec,cpu,batch_normalization,ncsp_bnorm:any,forward_inference,data_f32::blocked:abcd:f0 diff_undef::undef::f0,flags:3,mb1ic1ih100000000iw43,861.505
ok
test_large_array_mkldnn.test_flatten ... mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32::blocked:abc:f0 dst_f32::blocked:abc:f0,num:1,50000000x2x43,346.131
ok
test_large_array_mkldnn.test_concat ... mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,concat,simple:any,undef,src_f32::blocked:ab:f0 src_f32::blocked:ab:f0 dst_f32::blocked:ab:f0,num:2,43x200000000,1070.45
ok
Helper function that cleans up memory by releasing it from memory pool ... ok

----------------------------------------------------------------------
Ran 2 tests in 24.734s
OK
test_large_array.test_expand_dims ... mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32::blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,100000000x43,398.925
ok
----------------------------------------------------------------------
Ran 1 test in 0.921s

OK
test_large_array_mkldnn.test_elemwise_add ... mkldnn_verbose,exec,cpu,sum,simple:any,undef,src_f32::blocked:ab:f0 src_f32::blocked:ab:f0 dst_f32::blocked:ab:f0,num:2,100000000x43,517.932
ok
test_large_array_mkldnn.test_slice ... mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x42,0.0600586
ok
Helper function that cleans up memory by releasing it from memory pool ... ok
OK
----------------------------------------------------------------------
Ran 2 tests in 48.266s

OK

@pengzhao-intel
Copy link
Contributor

@apeforest @marcoabreu please take a review for this PR again.
We'd better close this one soon since our people are moving faster for the new features :)

@marcoabreu marcoabreu merged commit 4f14bf4 into apache:master Nov 19, 2019
CPU Performance and Quantization automation moved this from Review in progress to Done Nov 19, 2019
@wuxun-zhang wuxun-zhang deleted the add_mkldnn_lts_ut branch November 19, 2019 10:19
Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Sorry for the delay.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants