Performance improving for MKL-DNN Quantized FullyConnected #14528

ciyongch · 2019-03-26T14:21:10Z

Description

The patch is mainly for improving the performance of MKL-DNN quantized FullyConnected.

@pengzhao-intel @TaoLv

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

…performance

pengzhao-intel

It's good to use an enum instead of the hardcode number in the code.

LGTM.

pengzhao-intel · 2019-03-26T14:43:17Z

src/operator/nn/fully_connected-inl.h

@@ -48,6 +48,12 @@ enum FullyConnectedOpResource {kTempSpace};
 enum FullyConnectedOpOutputs {kOut};
 } // fullc

+namespace quantized_fullc {


quantized_fc?

just to align with fullc :)

pengzhao-intel · 2019-03-26T14:45:36Z

src/operator/subgraph/mkldnn/mkldnn_fc.cc

@@ -195,6 +195,13 @@ void SgMKLDNNFCOp::Forward(const OpContext &ctx,
 }

 MKLDNNFCForwardFullFeature(full_param_, ctx, fwd_.get(), new_inputs, new_req, out_data);
+
+ if (mkldnn_param.quantized && !mkldnn_param.enable_float_output) {


add comments on why

I think it's straightforward here, the OutMin and OutMax are only valid when the op is quantized and not generating fp32 output.

abhinavs95 · 2019-03-26T18:35:10Z

@mxnet-label-bot update [MKLDNN, Performance, pr-awaiting-testing]

pengzhao-intel · 2019-03-27T01:15:08Z

@anirudh2290 @ZhennanQin @xinyu-intel to review

ZhennanQin · 2019-03-27T01:38:46Z

src/operator/quantization/mkldnn/mkldnn_quantized_fully_connected.cc

@@ -52,15 +47,15 @@ void MKLDNNQuantizedFullyConnectedForward(const nnvm::NodeAttrs &attrs,
 NDArray weight = in_data[fullc::kWeight];

 const float min_data =
- in_data[num_inputs + quantized_fc_enum::kDataMin].data().dptr<float>()[0];
+ in_data[num_inputs + quantized_fullc::kDataMin].data().dptr<float>()[0];


Quite strange usage. Why not define a whole input sets with original inputs?

Original inputs might not include bias, which results in different index for all these min/max. Just to simplify the ordering for quantized op only.

ZhennanQin

LGTM.

pengzhao-intel · 2019-03-27T12:03:38Z

Thanks for your contribution. Merging now.

) * Cached bias to Quantized FullyCOnnected based on Subgraph to improve performance * retrigger CI * retrigger CI

* Cached bias to Quantized FullyCOnnected based on Subgraph to improve performance * retrigger CI * retrigger CI

) * Cached bias to Quantized FullyCOnnected based on Subgraph to improve performance * retrigger CI * retrigger CI

Cached bias to Quantized FullyCOnnected based on Subgraph to improve …

4f556a6

…performance

pengzhao-intel approved these changes Mar 26, 2019

View reviewed changes

marcoabreu added MKLDNN Performance pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 26, 2019

ZhennanQin reviewed Mar 27, 2019

View reviewed changes

ZhennanQin approved these changes Mar 27, 2019

View reviewed changes

TaoLv approved these changes Mar 27, 2019

View reviewed changes

xinyu-intel approved these changes Mar 27, 2019

View reviewed changes

ciyongch added 2 commits March 27, 2019 14:42

retrigger CI

b83f477

retrigger CI

62e7ea4

pengzhao-intel merged commit 5d2a451 into apache:master Mar 27, 2019

vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019

Performance improving for MKL-DNN Quantized FullyConnected (apache#14528

dbd6c3f

) * Cached bias to Quantized FullyCOnnected based on Subgraph to improve performance * retrigger CI * retrigger CI

nswamy pushed a commit that referenced this pull request Apr 5, 2019

Performance improving for MKL-DNN Quantized FullyConnected (#14528)

c4b8d30

* Cached bias to Quantized FullyCOnnected based on Subgraph to improve performance * retrigger CI * retrigger CI

szha added this to Done in CPU Performance and Quantization Apr 25, 2019

ciyongch deleted the qfc_perf branch May 22, 2019 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improving for MKL-DNN Quantized FullyConnected #14528

Performance improving for MKL-DNN Quantized FullyConnected #14528

ciyongch commented Mar 26, 2019

pengzhao-intel left a comment

pengzhao-intel Mar 26, 2019

ciyongch Mar 27, 2019

pengzhao-intel Mar 26, 2019

ciyongch Mar 27, 2019

abhinavs95 commented Mar 26, 2019 •

edited

Loading

pengzhao-intel commented Mar 27, 2019

ZhennanQin Mar 27, 2019

ciyongch Mar 27, 2019

ZhennanQin left a comment

pengzhao-intel commented Mar 27, 2019

Performance improving for MKL-DNN Quantized FullyConnected #14528

Performance improving for MKL-DNN Quantized FullyConnected #14528

Conversation

ciyongch commented Mar 26, 2019

Description

Checklist

Essentials

Changes

Comments

pengzhao-intel left a comment

Choose a reason for hiding this comment

pengzhao-intel Mar 26, 2019

Choose a reason for hiding this comment

ciyongch Mar 27, 2019

Choose a reason for hiding this comment

pengzhao-intel Mar 26, 2019

Choose a reason for hiding this comment

ciyongch Mar 27, 2019

Choose a reason for hiding this comment

abhinavs95 commented Mar 26, 2019 • edited Loading

pengzhao-intel commented Mar 27, 2019

ZhennanQin Mar 27, 2019

Choose a reason for hiding this comment

ciyongch Mar 27, 2019

Choose a reason for hiding this comment

ZhennanQin left a comment

Choose a reason for hiding this comment

pengzhao-intel commented Mar 27, 2019

abhinavs95 commented Mar 26, 2019 •

edited

Loading