[MKLDNN] Support channel wise quantization for FullyConnected #17187

ciyongch · 2019-12-27T12:29:27Z

Description

Add channel-wise quantization support for FullyConnected, which will bring accuracy benefit for some models such as BERT SQuAD, etc.
The default quantization keeps the same as now which is using tensor-wise quantization, user can switch to use channel-wise quantization by setting arguments quantize_granularity="channel-wise" when calling quantize_model().

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

@pengzhao-intel @TaoLv

pengzhao-intel

@ZhennanQin @xinyu-intel to review as well

python/mxnet/contrib/quantization.py

pengzhao-intel · 2019-12-30T02:10:50Z

python/mxnet/contrib/quantization.py

@@ -459,7 +464,8 @@ def quantize_model(sym, arg_params, aux_params,
 data_names=('data',), label_names=('softmax_label',),
 ctx=cpu(), excluded_sym_names=None, excluded_op_names=None, calib_mode='entropy',
 calib_data=None, num_calib_examples=None,
- quantized_dtype='int8', quantize_mode='smart', logger=None):
+ quantized_dtype='int8', quantize_mode='smart',
+ quantize_granularity='channel-wise', logger=None):


Is the default value "channel-wise"?

Good catch, forget to change back to tensor-wise as default value after validation :) Will change default value to tensor-wise.

xinyu-intel

also add quantize_granularity in quantize_model_mkldnn api and quantize_net_v2 api and make original type in quantize_net api. GluonNLP will use quantize_net_v2 api.
By the way, for the previous quantized models, will they still work with this patch?

ciyongch · 2019-12-30T05:02:56Z

also add quantize_granularity in quantize_model_mkldnn api and quantize_net_v2 api and make original type in quantize_net api. GluonNLP will use quantize_net_v2 api.
By the way, for the previous quantized models, will they still work with this patch?

Thanks @xinyu-intel, the new patch was updated to include those changes to support broader user-level quantization API. This patch is backward compatible, so no worry for the existing quantized models.

TaoLv · 2019-12-30T05:58:47Z

include/mxnet/c_api.h

- const char *quantize_mode, uint32_t* out_num_calib_names,
- const char ***out_calib_names);
+ const char *quantize_mode, const char *quantize_granularity,
+ uint32_t* out_num_calib_names, const char ***out_calib_names);


Break backward compatibility?

This API was called by _quantize_symbol() only, and the new argument quantize_granularity carries the default vale of tensor-wise from Python to C++ perspective.
For users doing quantization with quantization.py, setting quantize_granularity to channel-wise explicitly to do channel-wise quantization for FullyConnected, otherwise it will do tensor-wise quantization which is same as before (the default behavior).

TaoLv · 2019-12-30T08:56:49Z

python/mxnet/contrib/quantization.py

@@ -978,7 +1004,7 @@ def __exit__(self, exc_type, exc_value, traceback):
 net.collect_params().reset_ctx(ctx)
 return net

-def quantize_net(network, quantized_dtype='auto', quantize_mode='full',
+def quantize_net(network, quantized_dtype='auto', quantize_mode='full', quantize_granularity='tensor-wise',


How about keeping this API as is and encourage users to call quantize_net_v2 which contains the new parameters. quantize_net will call quantize_net_v2 with default parameters. @xinyu-intel

it's better to keep.

Looks like the only difference between quantize_net_v2 and quantize_net is the new param LayerOutputCollector so far, quantize_granularity will be another new param introduced by this PR.
As there's not any other difference, how about jsut combine these two API into one?
Otherwise, there could be v3, v4 version in the future if there's new param needed.

I don't know if there is a better way to handle this kind of change. Since quantize_net is an user interface and has been released, adding new parameters to it might break user workloads. One convention is to add v2 implementation, deprecate the original one, and replace it in the next major release. The v2 implementation is not released yet, so I think it's safe to change it to add quantize_granularity. If it's released, yes, we might need v3 implementation.

That make sense, so let's keep the quantize_net api as is, only adding the new params in v2 version.

TaoLv · 2019-12-30T09:03:08Z

src/operator/quantization/quantized_batch_norm.cc

- return true;
+.set_attr<FAvoidQuantizeInput>("FAvoidQuantizeInput", [](
+ const NodeAttrs &attrs, const size_t index, const std::string quantize_granularity) {
+ return (index == 0) ? false : true;


return (index != 0); ?

TaoLv · 2019-12-30T09:03:36Z

src/operator/quantization/quantized_indexing_op.cc

- return false;
+.set_attr<FAvoidQuantizeInput>("FAvoidQuantizeInput", [](
+ const NodeAttrs &attrs, const size_t index, const std::string quantize_granularity) {
+ return (index == 0) ? true : false;


return (index == 0); ?

TaoLv · 2019-12-30T09:09:51Z

src/operator/subgraph/mkldnn/mkldnn_fc.cc

+ // False True/False False
+ if (channel_wise && !support_channelwise_scale) {
+ LOG(FATAL)
+ << "Currently, channel-wise quantization requires fuse requantize or dequantize.";


Is it something that users may encounter? If so, what kind of action is suggested?

This is not the common case, but this would happen in the case of setting MXNET_DISABLE_MKLDNN_QFC_FLOAT_OUTPUT or MXNET_DISABLE_MKLDNN_QFC_FUSE_ALL to true, which means quantized Fullyconnected will not fused with either requantize or dequantize.
So this msg will give a hint to user enable fusion to enable this feature.

Thanks for the explanation. How about suggesting in the error message to check these two env variables and set correct values?

No problem :)

TaoLv · 2019-12-30T09:13:05Z

tests/python/mkl/test_subgraph.py

@@ -767,7 +773,7 @@ def test_pos_conv_bn_sum_act():
 "softrelu": True,
 "relu6": False,
 "leakyrelu": True,
- "gelu": True}
+ "gelu": False}


why change it false?

Convolution with post sum and activation only supports "relu" currently, previously UT didn't catch such failure.

ciyongch · 2020-01-03T00:57:20Z

@TaoLv @ZhennanQin @xinyu-intel please help to review the latest changes :)

ZhennanQin

LGTM

TaoLv · 2020-01-03T04:26:45Z

Thank you for the contribution. Merging now~

ciyongch added 2 commits December 27, 2019 16:33

Add channel wise quantization option for fullyconnected

b1be3d7

fix lint

8347a8a

ciyongch requested review from anirudh2290, eric-haibin-lin and szha as code owners December 27, 2019 12:29

pengzhao-intel added the MKLDNN label Dec 27, 2019

pengzhao-intel added this to In progress in CPU Performance and Quantization via automation Dec 27, 2019

ciyongch added 2 commits December 27, 2019 22:38

retrigger CI

4d4eff5

retrigger CI

d2c5502

xinyu-intel mentioned this pull request Dec 30, 2019

[FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering dmlc/gluon-nlp#1080

Merged

13 tasks

pengzhao-intel approved these changes Dec 30, 2019

View reviewed changes

CPU Performance and Quantization automation moved this from In progress to Reviewer approved Dec 30, 2019

xinyu-intel reviewed Dec 30, 2019

View reviewed changes

Add channel-wise option support for more quantization user-level API

c04890b

TaoLv reviewed Dec 30, 2019

View reviewed changes

ciyongch added 4 commits December 31, 2019 10:12

Only update quatnize_net_v2 API and keep quantize_net API unchange

5fab94d

retrigger CI

37c4f29

fix pylint

6a748c6

Add check for option

784371e

ZhennanQin approved these changes Jan 3, 2020

View reviewed changes

TaoLv approved these changes Jan 3, 2020

View reviewed changes

TaoLv merged commit 89fe1f6 into apache:master Jan 3, 2020

CPU Performance and Quantization automation moved this from Reviewer approved to Done Jan 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MKLDNN] Support channel wise quantization for FullyConnected #17187

[MKLDNN] Support channel wise quantization for FullyConnected #17187

ciyongch commented Dec 27, 2019

pengzhao-intel left a comment

pengzhao-intel Dec 30, 2019

ciyongch Dec 30, 2019

xinyu-intel left a comment

ciyongch commented Dec 30, 2019

TaoLv Dec 30, 2019

ciyongch Dec 30, 2019

TaoLv Dec 30, 2019

xinyu-intel Dec 30, 2019

ciyongch Dec 31, 2019

TaoLv Dec 31, 2019

ciyongch Dec 31, 2019

TaoLv Dec 30, 2019

ciyongch Dec 31, 2019

TaoLv Dec 30, 2019

ciyongch Dec 31, 2019

TaoLv Dec 30, 2019

ciyongch Dec 31, 2019

TaoLv Dec 31, 2019

ciyongch Dec 31, 2019

TaoLv Dec 30, 2019

ciyongch Dec 31, 2019

ciyongch commented Jan 3, 2020

ZhennanQin left a comment

TaoLv commented Jan 3, 2020

[MKLDNN] Support channel wise quantization for FullyConnected #17187

[MKLDNN] Support channel wise quantization for FullyConnected #17187

Conversation

ciyongch commented Dec 27, 2019

Description

Checklist

Essentials

Changes

Comments

pengzhao-intel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinyu-intel left a comment

Choose a reason for hiding this comment

ciyongch commented Dec 30, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ciyongch commented Jan 3, 2020

ZhennanQin left a comment

Choose a reason for hiding this comment

TaoLv commented Jan 3, 2020