Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MKLDNN] Support channel wise quantization for FullyConnected #17187

Merged
merged 9 commits into from
Jan 3, 2020

Conversation

ciyongch
Copy link
Contributor

Description

Add channel-wise quantization support for FullyConnected, which will bring accuracy benefit for some models such as BERT SQuAD, etc.
The default quantization keeps the same as now which is using tensor-wise quantization, user can switch to use channel-wise quantization by setting arguments quantize_granularity="channel-wise" when calling quantize_model().

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@pengzhao-intel @TaoLv

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZhennanQin @xinyu-intel to review as well

python/mxnet/contrib/quantization.py Show resolved Hide resolved
@@ -459,7 +464,8 @@ def quantize_model(sym, arg_params, aux_params,
data_names=('data',), label_names=('softmax_label',),
ctx=cpu(), excluded_sym_names=None, excluded_op_names=None, calib_mode='entropy',
calib_data=None, num_calib_examples=None,
quantized_dtype='int8', quantize_mode='smart', logger=None):
quantized_dtype='int8', quantize_mode='smart',
quantize_granularity='channel-wise', logger=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the default value "channel-wise"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, forget to change back to tensor-wise as default value after validation :) Will change default value to tensor-wise.

CPU Performance and Quantization automation moved this from In progress to Reviewer approved Dec 30, 2019
Copy link
Contributor

@xinyu-intel xinyu-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add quantize_granularity in quantize_model_mkldnn api and quantize_net_v2 api and make original type in quantize_net api. GluonNLP will use quantize_net_v2 api.
By the way, for the previous quantized models, will they still work with this patch?

@ciyongch
Copy link
Contributor Author

also add quantize_granularity in quantize_model_mkldnn api and quantize_net_v2 api and make original type in quantize_net api. GluonNLP will use quantize_net_v2 api.
By the way, for the previous quantized models, will they still work with this patch?

Thanks @xinyu-intel, the new patch was updated to include those changes to support broader user-level quantization API. This patch is backward compatible, so no worry for the existing quantized models.

const char *quantize_mode, uint32_t* out_num_calib_names,
const char ***out_calib_names);
const char *quantize_mode, const char *quantize_granularity,
uint32_t* out_num_calib_names, const char ***out_calib_names);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break backward compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API was called by _quantize_symbol() only, and the new argument quantize_granularity carries the default vale of tensor-wise from Python to C++ perspective.
For users doing quantization with quantization.py, setting quantize_granularity to channel-wise explicitly to do channel-wise quantization for FullyConnected, otherwise it will do tensor-wise quantization which is same as before (the default behavior).

@@ -978,7 +1004,7 @@ def __exit__(self, exc_type, exc_value, traceback):
net.collect_params().reset_ctx(ctx)
return net

def quantize_net(network, quantized_dtype='auto', quantize_mode='full',
def quantize_net(network, quantized_dtype='auto', quantize_mode='full', quantize_granularity='tensor-wise',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about keeping this API as is and encourage users to call quantize_net_v2 which contains the new parameters. quantize_net will call quantize_net_v2 with default parameters. @xinyu-intel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to keep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the only difference between quantize_net_v2 and quantize_net is the new param LayerOutputCollector so far, quantize_granularity will be another new param introduced by this PR.
As there's not any other difference, how about jsut combine these two API into one?
Otherwise, there could be v3, v4 version in the future if there's new param needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there is a better way to handle this kind of change. Since quantize_net is an user interface and has been released, adding new parameters to it might break user workloads. One convention is to add v2 implementation, deprecate the original one, and replace it in the next major release. The v2 implementation is not released yet, so I think it's safe to change it to add quantize_granularity. If it's released, yes, we might need v3 implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense, so let's keep the quantize_net api as is, only adding the new params in v2 version.

return true;
.set_attr<FAvoidQuantizeInput>("FAvoidQuantizeInput", [](
const NodeAttrs &attrs, const size_t index, const std::string quantize_granularity) {
return (index == 0) ? false : true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return (index != 0); ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good.

return false;
.set_attr<FAvoidQuantizeInput>("FAvoidQuantizeInput", [](
const NodeAttrs &attrs, const size_t index, const std::string quantize_granularity) {
return (index == 0) ? true : false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return (index == 0); ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good.

// False True/False False
if (channel_wise && !support_channelwise_scale) {
LOG(FATAL)
<< "Currently, channel-wise quantization requires fuse requantize or dequantize.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it something that users may encounter? If so, what kind of action is suggested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the common case, but this would happen in the case of setting MXNET_DISABLE_MKLDNN_QFC_FLOAT_OUTPUT or MXNET_DISABLE_MKLDNN_QFC_FUSE_ALL to true, which means quantized Fullyconnected will not fused with either requantize or dequantize.
So this msg will give a hint to user enable fusion to enable this feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. How about suggesting in the error message to check these two env variables and set correct values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem :)

@@ -767,7 +773,7 @@ def test_pos_conv_bn_sum_act():
"softrelu": True,
"relu6": False,
"leakyrelu": True,
"gelu": True}
"gelu": False}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change it false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convolution with post sum and activation only supports "relu" currently, previously UT didn't catch such failure.

@ciyongch
Copy link
Contributor Author

ciyongch commented Jan 3, 2020

@TaoLv @ZhennanQin @xinyu-intel please help to review the latest changes :)

Copy link
Contributor

@ZhennanQin ZhennanQin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TaoLv
Copy link
Member

TaoLv commented Jan 3, 2020

Thank you for the contribution. Merging now~

@TaoLv TaoLv merged commit 89fe1f6 into apache:master Jan 3, 2020
CPU Performance and Quantization automation moved this from Reviewer approved to Done Jan 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants