MKLDNN based Quantized FullyConnected Operator and its fusion #14128

ciyongch · 2019-02-12T08:33:51Z

Description

This PR added MKL-DNN based quantized FullyConnected operator via FComputeEx API, and changed MKL IGEMM based quantized FullyConnected operator to FCompute API.
The PR also added the subgraph implementation for both FullyConnected and quantized FullyConnected to provide more operator fusion on graph level (it's easier to extend other element-wise operator fusion in the future), the following patterns are supported currently: FullyConnected + relu, quantized FullyConnected + requantize, and quantized FullyConnected + requantize + dequantize.

@pengzhao-intel @TaoLv @ZhennanQin @zheng-da

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

pengzhao-intel · 2019-02-12T10:24:42Z

@KellenSunderland @reminisce for the review :)

ankkhedia · 2019-02-12T20:20:58Z

@ciyongch Thanks for the review!

@mxnet-label-bot add [pr-awaiting-review, MKLDNN]

TaoLv · 2019-02-13T14:56:58Z

python/mxnet/initializer.py

@@ -245,6 +254,10 @@ def _init_weight(self, name, arr):
 """Abstract method to Initialize weight."""
 raise NotImplementedError("Must override it")

+ def _init_quantized_weight(self, _, arr):
+ _arr = random.randint(-127, 127, dtype='int32').asnumpy()


Ummm, seems need extend randint to support dtype='int8'.

Yes, int8 dtype is limitied to current randint.

src/operator/nn/mkldnn/mkldnn_fully_connected-inl.h

TaoLv · 2019-02-13T15:06:59Z

src/operator/nn/mkldnn/mkldnn_fully_connected.cc

+ this->fwd = std::shared_ptr<mkldnn::inner_product_forward>(
+ new mkldnn::inner_product_forward(
+ fwd_pd, mkldnn::primitive::at(*this->data),
+ mkldnn::primitive::at(*this->weight), *this->out));
 }


need else.

Actually, there's nothing to do in the else clause if there's one, so I didn't put else here.
But I can change this piece of code in another way (which is equivalent to current one)

if (bias != null) { .... } else { if (this->fwd_ == nullptr) { .... } }

src/operator/nn/mkldnn/mkldnn_fully_connected.cc

src/operator/quantization/mkldnn/mkldnn_quantized_ops-inl.h

TaoLv · 2019-02-13T15:23:17Z

src/operator/quantization/quantized_fully_connected.cc

@@ -265,8 +313,11 @@ and max thresholds representing the threholds for quantizing the float32 output
 .set_attr<nnvm::FInferType>("FInferType", QuantizedFullyConnectedType)
 .set_attr<FInferStorageType>("FInferStorageType", QuantizedFullyConnectedStorageType)
 .set_attr<FNeedRequantize>("FNeedRequantize", [](const NodeAttrs& attrs) { return true; })
-.set_attr<FComputeEx>("FComputeEx<cpu>",
- QuantizedFullyConnectedForward<int8_t>)
+.set_attr<FCompute>("FCompute<cpu>", QuantizedFullyConnectedForwardCPU)


Wrap this line into #if MSHADOW_USE_MKL == 1?

I think the current implementation could give more information about quantizedFullyConnected dependencies on CPU for users. If put the Macro in op's attributes, only give 'not implemented' information.

TaoLv · 2019-02-13T15:24:29Z

src/operator/subgraph/mkldnn/mkldnn_fc.cc

@@ -0,0 +1,434 @@
+/*


filename: mkldnn_fc.cc -> mkldnn_fully_connected.cc?

There's already a mkldnn_fully_connected.cc in src/operator/nn/mkldnn/, so I choosed mkldnn_fc.cc for subgraph part to make this one different.

src/operator/subgraph/mkldnn/mkldnn_fc.cc

TaoLv · 2019-02-13T15:33:30Z

src/operator/subgraph/mkldnn/mkldnn_fc_property.cc

+ bool disable_fc_relu;
+};
+
+MXNET_REGISTER_SUBGRAPH_PROPERTY(MKLDNN_FC, SgMKLDNNFCProperty);


Seems we have several property name now, do we have any document to list and explain them?

There's some updates to current subgraph API changes in PR #14113, suggest to update the property name and rules after it's merged.

pengzhao-intel · 2019-02-23T04:15:01Z

@ZhennanQin could you help to review this PR?

ZhennanQin · 2019-03-01T02:13:46Z

src/operator/nn/mkldnn/mkldnn_fully_connected-inl.h

+
+struct MKLDNNFCParam: public dmlc::Parameter<MKLDNNFCParam> {
+ bool quantized;
+ bool fuse_requantize;


Is fuse_requantize necessary? Why don't directly check if min_calib_range and max_calib_range have value?

This param was added to on pair with fuse_dequantize, but indeed this can be replaced by checking min/max_calib_range.
I will remove it.

ZhennanQin · 2019-03-01T02:16:15Z

src/operator/nn/mkldnn/mkldnn_fully_connected-inl.h

+struct MKLDNNFCParam: public dmlc::Parameter<MKLDNNFCParam> {
+ bool quantized;
+ bool fuse_requantize;
+ bool fuse_dequantize;


How about rename fuse_dequantize to float_output? End-user may doesn't care the fusion details, but want to get some straight forward meaning they cares.

Sure, but this param is only touch by subgraph but not end-user, instead end-user may care about the env of 'MXNET_DISABLE_MKLDNN_QFC_FUSE_DEQUANTIZE'. Will change this name.

ZhennanQin · 2019-03-01T02:19:32Z

src/operator/nn/mkldnn/mkldnn_fully_connected.cc

+ if (full_param.mkldnn_param.with_relu) {
+ float scale = 1.0f;
+ float alpha = 0.0f;
+ float beta = 1.0f;


Add const

pengzhao-intel · 2019-03-01T02:20:26Z

@reminisce @zheng-da @anirudh2290 could you help to take a review?

ZhennanQin · 2019-03-01T02:28:34Z

src/operator/nn/mkldnn/mkldnn_fully_connected.cc

 MKLDNNFullyconSignature key(param);
+ key.AddSign(is_train);
 key.AddSign(data);
 key.AddSign(weight);


Seems quantized FC will call this function as well. Then output_scale should be a part of hashed key. Better to hash whole mkldnn_param.

Good point :)
Currently, this function is only called by normal FullyConnected or Quantized FC but not subgraph FC. While the output_scale or the whole mkldnn_param was only used by subgraph FC and they're useless here actually.

ZhennanQin · 2019-03-01T02:30:14Z

src/operator/nn/mkldnn/mkldnn_fully_connected.cc

- it = ins_ret.first;
+ MKLDNNFCFullParam full_param;
+ full_param.default_param = param;
+ full_param.mkldnn_param.Init(std::unordered_map<std::string, std::string>());


Not correct, should pass down real mkldnn_param.

Since this function is only called by normal FullyConnected and Quantized FC, while mkldnn_param was not used by these two Ops, so only default_param passed down from caller is enough.

ZhennanQin · 2019-03-01T02:36:47Z

src/operator/quantization/mkldnn/mkldnn_quantized_fully_connected.cc

+ #pragma omp parallel for num_threads(engine::OpenMP::Get()->GetRecommendedOMPThreadCount())
+ for (size_t i = 0; i < bias_size; ++i) {
+ quantized_bias_ptr[i] = bias_ptr[i] * bias_int32_rescale;
+ }


Recommend to use mkldnn reorder instead.

Will mkldnn reorder always better than this way (reorder may introduce overhead)? since bias is usually not a big array.

ZhennanQin · 2019-03-01T02:39:52Z

src/operator/quantization/quantized_fully_connected.cc

@@ -72,7 +79,7 @@ bool QuantizedFullyConnectedType(const nnvm::NodeAttrs& attrs,
 CHECK_EQ(in_type->size(), num_inputs * 3);
 CHECK_EQ(out_type->size(), 3U);

- for (size_t i = 0; i < num_inputs; ++i) {
+ for (size_t i = 1; i < num_inputs; ++i) {


Why skip i = 0?

input[0] will supports both INT8 and UINT8 here.

Then please add check for input[0] on INT8 or UINT8.

pengzhao-intel · 2019-03-03T03:01:11Z

@TaoLv @ZhennanQin please take a review again. If there are no other comments, please help to approve and we need to merge the PR soon.

pengzhao-intel

LGTM.

The code has been verified by several internal cases and got the expected accuracy and performance.

szha · 2019-03-03T03:11:36Z

src/operator/nn/mkldnn/mkldnn_fully_connected-inl.h

+
+/*!
+ * \file mkldnn_fully_connected-inl.h
+ * \brief


incomplete doc?

Do you mean missing author info here?

right, along with the missing description

Sure, I will update the missing description for all the new files.

szha · 2019-03-03T03:22:55Z

src/operator/nn/mkldnn/mkldnn_fully_connected-inl.h

+
+ DMLC_DECLARE_PARAMETER(MKLDNNFCParam) {
+ DMLC_DECLARE_FIELD(quantized).set_default(false)
+ .describe("enable quantization");


Let's use consistent standard for user-facing documentation.

No problem, will update the description.

TaoLv · 2019-03-03T03:47:36Z

src/operator/nn/mkldnn/mkldnn_fully_connected.cc

+ } catch (mkldnn::error &e) {
+ if (e.status == mkldnn_unimplemented &&
+ full_param.mkldnn_param.quantized) {
+ LOG(ERROR) << "AVX512-BW support or MKLDNN v0.18 is required for INT8 fully_connected.";


What's the difference between LOG(ERROR) or LOG(FATAL)?

LOG(ERROR) works same as LOG(INFO), while LOG(FATAL) will throw an error with an error info and stop running.
LOG(ERROR) was used here to give the hint and then throw the original error later.

ZhennanQin

LGTM

ciyongch · 2019-03-04T01:22:09Z

Thanks for your review @szha @TaoLv @ZhennanQin @pengzhao-intel :)
I've updated the codes according to your comments, please help to check if there's any other comments.

TaoLv

LGTM now. Thank you for the contribution.

ciyongch · 2019-03-05T02:55:09Z

Rebase to latest code base.

ciyongch · 2019-03-05T07:26:56Z

@pengzhao-intel @TaoLv
The code is updated to latest, all the comments are addressed, please help to check and merge if not other comments :)

TaoLv · 2019-03-06T01:47:50Z

@szha please confirm your concerns are fully addressed. Thanks.

…put.

TaoLv · 2019-03-07T02:50:05Z

@ciyongch please take a look at the conflicts. I would like to have this PR merged in 24 hours if there is no other comments or conflicts.

ciyongch · 2019-03-07T02:55:32Z

@TaoLv no problem.

TaoLv · 2019-03-08T04:34:54Z

Thank you for the contribution @ciyongch. Merging now.

…#14128) * add MKL-DNN quantized innerproduct * initial qfc with mkldnn * Add MKL-DNN quantized_fully_connected * refactor params order for fullyconnected * update quantized_fully_connected unittest, force data to uint8 type temporary * change mkl based quantized fully_connected to FCompute * add check data type for mkldnn quantized_fc * add fuse requantize and dequantize for mkldnn quantized fullyconnected * add env setting for enable/disable fuse requantize/dequantize for quantize fullyconnected * fix requantize scaling error * add fallback when input data is int8 * fix mkl quantized fullyconnected index error * update quantized fc test cases * add subgraph node for mkldnn fullyconnected * fix compiling and lint error * clean and refactor code * enable quantized_fc for imagenet * cleanup code * Fix StorageType error for non-mkldnn path * fix pylint * reverse BUILD_TAG for MKL IGEMM ut, remove IGEMM qfc check * rename variables and refactor codes according to comments * add subgraph qfc tests and fix shape error * remove fuse_requantize and change fuse_dequantize to enable_float_output. * change to use mxnet::Tuple and update tests * update description in file header * update input0 type check for quantized FullyConnected * fix conflit of mkl/test_subgraph.py * retrigger CI * retrigger CI due to hang

* add MKL-DNN quantized innerproduct * initial qfc with mkldnn * Add MKL-DNN quantized_fully_connected * refactor params order for fullyconnected * update quantized_fully_connected unittest, force data to uint8 type temporary * change mkl based quantized fully_connected to FCompute * add check data type for mkldnn quantized_fc * add fuse requantize and dequantize for mkldnn quantized fullyconnected * add env setting for enable/disable fuse requantize/dequantize for quantize fullyconnected * fix requantize scaling error * add fallback when input data is int8 * fix mkl quantized fullyconnected index error * update quantized fc test cases * add subgraph node for mkldnn fullyconnected * fix compiling and lint error * clean and refactor code * enable quantized_fc for imagenet * cleanup code * Fix StorageType error for non-mkldnn path * fix pylint * reverse BUILD_TAG for MKL IGEMM ut, remove IGEMM qfc check * rename variables and refactor codes according to comments * add subgraph qfc tests and fix shape error * remove fuse_requantize and change fuse_dequantize to enable_float_output. * change to use mxnet::Tuple and update tests * update description in file header * update input0 type check for quantized FullyConnected * fix conflit of mkl/test_subgraph.py * retrigger CI * retrigger CI due to hang

…#14128) * add MKL-DNN quantized innerproduct * initial qfc with mkldnn * Add MKL-DNN quantized_fully_connected * refactor params order for fullyconnected * update quantized_fully_connected unittest, force data to uint8 type temporary * change mkl based quantized fully_connected to FCompute * add check data type for mkldnn quantized_fc * add fuse requantize and dequantize for mkldnn quantized fullyconnected * add env setting for enable/disable fuse requantize/dequantize for quantize fullyconnected * fix requantize scaling error * add fallback when input data is int8 * fix mkl quantized fullyconnected index error * update quantized fc test cases * add subgraph node for mkldnn fullyconnected * fix compiling and lint error * clean and refactor code * enable quantized_fc for imagenet * cleanup code * Fix StorageType error for non-mkldnn path * fix pylint * reverse BUILD_TAG for MKL IGEMM ut, remove IGEMM qfc check * rename variables and refactor codes according to comments * add subgraph qfc tests and fix shape error * remove fuse_requantize and change fuse_dequantize to enable_float_output. * change to use mxnet::Tuple and update tests * update description in file header * update input0 type check for quantized FullyConnected * fix conflit of mkl/test_subgraph.py * retrigger CI * retrigger CI due to hang

ciyongch requested a review from szha as a code owner February 12, 2019 08:33

ciyongch changed the title ~~Stateful inner product~~ MKLDNN based Quantized FullyConnected Operator and its fusion Feb 12, 2019

marcoabreu added MKLDNN pr-awaiting-review PR is waiting for code review labels Feb 12, 2019

TaoLv reviewed Feb 13, 2019

View reviewed changes

ZhennanQin reviewed Mar 1, 2019

View reviewed changes

ciyongch force-pushed the stateful_inner_product branch from 006cb49 to b7d8324 Compare March 2, 2019 09:32

pengzhao-intel approved these changes Mar 3, 2019

View reviewed changes

szha reviewed Mar 3, 2019

View reviewed changes

TaoLv reviewed Mar 3, 2019

View reviewed changes

ZhennanQin approved these changes Mar 4, 2019

View reviewed changes

TaoLv approved these changes Mar 4, 2019

View reviewed changes

ciyongch force-pushed the stateful_inner_product branch from 95167fc to 28e1242 Compare March 5, 2019 02:54

add MKL-DNN quantized innerproduct

f034b86

ciyongch added 16 commits March 7, 2019 10:24

fix mkl quantized fullyconnected index error

bd6f313

update quantized fc test cases

cb0bcfa

add subgraph node for mkldnn fullyconnected

ce44bd6

fix compiling and lint error

3532bd5

clean and refactor code

678d555

enable quantized_fc for imagenet

95dfffe

cleanup code

8a9e2f9

Fix StorageType error for non-mkldnn path

d891e0b

fix pylint

4da6f5a

reverse BUILD_TAG for MKL IGEMM ut, remove IGEMM qfc check

40039bd

rename variables and refactor codes according to comments

68be291

add subgraph qfc tests and fix shape error

ca6a427

remove fuse_requantize and change fuse_dequantize to enable_float_out…

517f55d

…put.

change to use mxnet::Tuple and update tests

16f0b07

update description in file header

bb8a294

update input0 type check for quantized FullyConnected

9ec3cf9

fix conflit of mkl/test_subgraph.py

b8edfb5

ciyongch force-pushed the stateful_inner_product branch from 9b1fd6e to b8edfb5 Compare March 7, 2019 05:31

ciyongch added 2 commits March 7, 2019 15:12

retrigger CI

145f454

retrigger CI due to hang

35a711a

TaoLv merged commit 8668db7 into apache:master Mar 8, 2019

ciyongch deleted the stateful_inner_product branch March 13, 2019 02:26

TaoLv mentioned this pull request Mar 19, 2019

MKL-DNN QuantizedFullyConnectedOp Error #14467

Closed

szha added this to Done in CPU Performance and Quantization Apr 25, 2019

MKLDNN based Quantized FullyConnected Operator and its fusion #14128

MKLDNN based Quantized FullyConnected Operator and its fusion #14128

Conversation

ciyongch commented Feb 12, 2019

Description

Checklist

Essentials

Changes

Comments

pengzhao-intel commented Feb 12, 2019

ankkhedia commented Feb 12, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pengzhao-intel commented Feb 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pengzhao-intel commented Mar 1, 2019

ZhennanQin Mar 1, 2019 • edited Loading

Choose a reason for hiding this comment

ciyongch Mar 1, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ciyongch Mar 1, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pengzhao-intel commented Mar 3, 2019

pengzhao-intel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZhennanQin left a comment

Choose a reason for hiding this comment

ciyongch commented Mar 4, 2019

TaoLv left a comment

Choose a reason for hiding this comment

ciyongch commented Mar 5, 2019

ciyongch commented Mar 5, 2019

TaoLv commented Mar 6, 2019

TaoLv commented Mar 7, 2019

ciyongch commented Mar 7, 2019

TaoLv commented Mar 8, 2019

ZhennanQin Mar 1, 2019 •

edited

Loading

ciyongch Mar 1, 2019 •

edited

Loading

ciyongch Mar 1, 2019 •

edited

Loading