Enable MKL-DNN FullyConnected backward #17318

TaoLv · 2020-01-15T09:23:12Z

Description

Discussed with @rongzha1 offline. This PR incorporates the code changes in #16890 which was reverted previously.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

…bator-mxnet into enable-fc-bwd

…to enable-fc-bwd

TaoLv · 2020-02-14T08:23:45Z

This is ready for review now. I also changed the cpp test to address the FullyConnectedOp failure reported before. The main reason is the tensor shape is relative large and the max value in the tensor is 50. It will lead to float number accumulation error. @rongzha1 @ciyongch @pengzhao-intel

pengzhao-intel · 2020-02-14T10:13:26Z

tests/cpp/include/test_mkldnn.h

@@ -330,7 +330,7 @@ inline void PrintVerifyMsg(const NDArrayAttrs &arr1, const NDArrayAttrs &arr2) {
 */
 inline std::vector<NDArrayAttrs> GetTestInputArrays(
 int types = ArrayTypes::All, bool rand = false,
- std::vector<float> scale = {1}, bool spatial_data_format = false) {
+ std::vector<float> scale = {1}, bool spatial_data_format = false, int max = 50) {


Is the new parameter for better usability?

See #17318 (comment).

ciyongch

Nice catch, the failure bothered us a lot in the past:)

ciyongch · 2020-02-14T11:42:34Z

tests/cpp/include/test_mkldnn.h

 const TBlob &blob = arr->data();
 mshadow::default_real_t *data = blob.dptr<mshadow::default_real_t>();
 int size = blob.Size();

 for (int i = 0; i < size; i++)
 if (is_rand) {
- data[i] = (std::rand() % 100) - 50;
+ data[i] = (std::rand() % (max * 2)) - max;


How about change to data[i] = std::rand() * 1.0 / RAND_MAX - 0.5;? As max = 1 will only generate two values: -1.0and 0.0 .

Because I don't want to affect other test case which still use the default max=50 to generate integers in [50, 50). But For the FullyConnectedOp, I want to generate relative small numbers. With the given code, the elements will be -1 and 0. Any suggestion?

I've no idea about why the range was set to [-50, 50) previously, and I can't figure out any specific reasons to use this range for the tests (any upper bound test?). It'll be great if you have any background for it.
But anyway, the tensors with only two values (-1 and 0, 50% are 0) might not be a good candidate for the tests.

Okay, i will change to generate float numbers in [-max, max) rather than integer numbers. Previously I thought sparse (say 50% zeros) is also a way to avoid float number accumulation error.

@ciyongch Failed to do so. There are cases doing bit correction check. It seems we cannot pass the tests with these floating numbers.
http:https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-17318/17/pipeline/
https://github.com/apache/incubator-mxnet/blob/master/tests/cpp/include/test_mkldnn.h#L605

With memcmp to check the results, then the only choice is integer numbers. Is it reasonable to check the results by AssertEqual within a small enough threshold like 1e-6, then we can keep the floating number with better distribution?
Or we can just increase max to filling more different numbers other than only -1 and 0.
What do you think?

I have reverted the changes for floating numbers. Changing ``memcmptoAssertEqual` is out of the scope of this PR, so I will keep it as is.

Or we can just increase max to filling more different numbers other than only -1 and 0.

I was thinking about including number 2 into the generated tensor but found that with the given shapes, there still has chance to get error. That means for the worst case, the intermediate accumulation value will be > 2^24, so the 1 will be ignored when accumulating another 1 to it.

Ok, let's keep it as is for now.

ciyongch · 2020-02-14T11:46:07Z

tests/cpp/include/test_mkldnn.h

 } else {
- data[i] = i % 100 - 50;
+ data[i] = i % (max * 2) - max;


Same as above, how about change to something like data[i] = i * 2.0 / size - 1.0 to generate [-1.0, 1.0)?

…to enable-fc-bwd

This reverts commit 56d873f.

…to enable-fc-bwd

ciyongch

LGTM

* fix mkldnn fc bwd bug due to data inplace * enable mkldnn fc bwd * fix cpp tests * try: fix random seed * fix cpp test * loose rtol for fc cpp test * improve error message * limit max value for mkldnn tensors * limit the max value of test tensors * fix lint * remove fixed random seed * address review comments * Revert "address review comments" This reverts commit 56d873f. Co-authored-by: rongzha1 <[email protected]>

rongzha1 and others added 10 commits November 25, 2019 15:17

fix mkldnn fc bwd bug due to data inplace

0b33c8c

enable mkldnn fc bwd

2d74cc4

Merge commit 'refs/pull/16890/head' of https://github.com/apache/incu…

f4e6557

…bator-mxnet into enable-fc-bwd

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

db88a23

…to enable-fc-bwd

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

6a98aa4

…to enable-fc-bwd

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

e17f70d

…to enable-fc-bwd

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

9c7596b

…to enable-fc-bwd

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

d015609

…to enable-fc-bwd

fix cpp tests

a103baa

try: fix random seed

caceb9b

TaoLv requested a review from anirudh2290 as a code owner February 9, 2020 03:13

TaoLv added 2 commits February 10, 2020 15:43

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

7bbbc7f

…to enable-fc-bwd

fix cpp test

1bf97fa

TaoLv force-pushed the enable-fc-bwd branch from 1c3eb75 to 2393889 Compare February 11, 2020 03:29

TaoLv added 10 commits February 11, 2020 18:54

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

b374889

…to enable-fc-bwd

loose rtol for fc cpp test

2393889

improve error message

8a01fef

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

979df7a

…to enable-fc-bwd

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

5525f71

…to enable-fc-bwd

limit the max value of test tensors

2200653

fix lint

faa1db8

limit max value for mkldnn tensors

2a3ebb4

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

34549f2

…to enable-fc-bwd

remove fixed random seed

93eb151

TaoLv changed the title ~~[WIP] Enable MKL-DNN FullyConnected backward~~ Enable MKL-DNN FullyConnected backward Feb 14, 2020

rongzha1 approved these changes Feb 14, 2020

View reviewed changes

pengzhao-intel reviewed Feb 14, 2020

View reviewed changes

ciyongch reviewed Feb 14, 2020

View reviewed changes

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

38070d9

…to enable-fc-bwd

TaoLv added 4 commits February 16, 2020 23:21

address review comments

56d873f

Revert "address review comments"

43905c3

This reverts commit 56d873f.

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

edef1c7

…to enable-fc-bwd

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

464204f

…to enable-fc-bwd

ciyongch approved these changes Feb 19, 2020

View reviewed changes

pengzhao-intel merged commit cf42535 into apache:master Feb 19, 2020

pengzhao-intel added the MKLDNN label Feb 19, 2020

pengzhao-intel added this to In progress in CPU Performance and Quantization via automation Feb 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable MKL-DNN FullyConnected backward #17318

Enable MKL-DNN FullyConnected backward #17318

TaoLv commented Jan 15, 2020

TaoLv commented Feb 14, 2020

pengzhao-intel Feb 14, 2020

TaoLv Feb 14, 2020

ciyongch left a comment

ciyongch Feb 14, 2020

TaoLv Feb 14, 2020

ciyongch Feb 16, 2020

TaoLv Feb 16, 2020

TaoLv Feb 17, 2020

ciyongch Feb 17, 2020

TaoLv Feb 17, 2020

ciyongch Feb 18, 2020

ciyongch Feb 14, 2020

ciyongch left a comment

Enable MKL-DNN FullyConnected backward #17318

Enable MKL-DNN FullyConnected backward #17318

Conversation

TaoLv commented Jan 15, 2020

Description

Checklist

Essentials

Changes

Comments

TaoLv commented Feb 14, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ciyongch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ciyongch left a comment

Choose a reason for hiding this comment