[MKLDNN] add quantized sum #14614

rongzha1 · 2019-04-04T05:49:56Z

Description

add quantized sum impl for mkldnn, support int8&&uint8

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

[done ] Changes are complete (i.e. I finished coding on this PR)
[done ] All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
[done ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

[done ] Feature1, tests, (and when applicable, API doc)
testcase is added in : tests/python/quantization/test_quantization.py test_quantized_sum()

pengzhao-intel · 2019-04-04T05:51:55Z

@anirudh2290 @ZhennanQin @TaoLv @ciyongch to review :)

pengzhao-intel · 2019-04-04T05:59:58Z

@mxnet-label-bot Add [Quantization, MKLDNN]

TaoLv · 2019-04-04T14:23:46Z

src/operator/quantization/quantized_sum.cc

+}
+
+NNVM_REGISTER_OP(_contrib_quantized_sum)
+.describe(R"code(Adds arguments element-wise.


Please change the document.

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

TaoLv · 2019-04-08T09:39:51Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+ // dataA && dataB are uint8
+ if (out_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) {
+ output_data_range = kInt8Range;
+ output_data_type = mkldnn::memory::s8;


TaoLv · 2019-04-08T09:40:19Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+ float B_scale = GetScale(in_data[quantized_sum_enum::kDataB], dataB_min, dataB_max);
+ // rescaled_mem is for reorder mkldnn memory
+ std::shared_ptr<mkldnn::memory> rescaled_mem;
+ // output default set as int32


Int32 by default. Do we have any other choice?

when fusion with requantize op, out put is int8/uint8

TaoLv · 2019-04-08T09:41:24Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+ auto s8_pd = (dataA_int8 == true)
+ ? dataA_mem->get_primitive_desc()
+ : dataB_mem->get_primitive_desc();
+ rescaled_mem = std::make_shared<mkldnn::memory>(s8_pd);


Will allocate memory here?

reorder ( line 134 ) is done in this if() field, so need allocate memory first.

Conventionally, we don't want to allocate memory implicitly inside MKL-DNN API. Besides, seems this allocation will happen every iteration which is performance problematic.

mkldnn sum doesn't support int8 + uint8, so need to reorder them to the same data type first.

change them to TmpMemMgr::Get()->Alloc

TaoLv · 2019-04-08T09:41:58Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+ output_min = 0 - output_max;
+ }
+
+ std::vector<float> scales;


How many scales do we have? Is it possible to reserve space for them?

two, scale 0 for dataA, scale 1 for dataB. OK will reserve first

Suggest:

// scale 0 is for data A, scale 1 is for data B std::vector<float> scales(2);

TaoLv · 2019-04-08T09:43:34Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+
+ auto dataA_mem = in_data[quantized_sum_enum::kDataA].GetMKLDNNData();
+ auto dataB_mem = in_data[quantized_sum_enum::kDataB].GetMKLDNNData();
+ bool dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) ? true : false;


OK。 add const for const variable

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

ciyongch · 2019-04-09T04:16:49Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum-inl.h

+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */


Please add doc info in the header of the new files, including Copyright/brief/author...

ciyongch · 2019-04-09T04:25:14Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+
+ auto dataA_mem = in_data[quantized_sum_enum::kDataA].GetMKLDNNData();
+ auto dataB_mem = in_data[quantized_sum_enum::kDataB].GetMKLDNNData();
+ const bool dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8)


is_dataA_int8 could be better for understanding..

ciyongch · 2019-04-09T04:25:43Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+
+DMLC_REGISTER_PARAMETER(RequantizeSumParam);
+
+static float GetScale(const NDArray& data, float min, float max) {


inline func?

ciyongch · 2019-04-09T04:28:25Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+ if (out_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8) {
+ output_data_range = kInt8Range;
+ output_data_type = mkldnn::memory::s8;
+ } else if (out_data[quantized_sum_enum::kDataA].dtype() == mshadow::kUint8) {


L74 & L77, is it kOut but not kDataA?

ciyongch · 2019-04-09T04:29:31Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+ out_data_scale = output_data_range/MaxAbs(output_min, output_max);
+ } else {
+ output_max = dataA_absmax + dataB_absmax;
+ output_min = 0 - output_max;


output_min = -output_max;

ciyongch · 2019-04-09T04:33:56Z

src/operator/quantization/quantized_sum.cc

+ if (in_type->at(i) == mshadow::kInt8) {
+ TYPE_ASSIGN_CHECK(*in_type, i, mshadow::kInt8);
+ } else {
+ TYPE_ASSIGN_CHECK(*in_type, i, mshadow::kUint8);


CHECK(in_type->at(i) == mshadow::kInt8 || in_type->at(i) == mshadow::kUint8);

ciyongch · 2019-04-09T04:36:03Z

src/operator/subgraph/mkldnn/mkldnn_subgraph_property.cc

@@ -21,7 +21,7 @@

 #include "mkldnn_conv_property.h"
 #include "mkldnn_fc_property.h"
-#include "mkldnn_conv_post_quantize_property.h"
+#include "mkldnn_post_quantize_property.h"


did you remove the conv part?

merge them in one file

ciyongch · 2019-04-09T04:37:14Z

tests/python/quantization/test_quantization.py

+
+@with_seed()
+def test_quantized_sum():
+ def check_quantized_sum(data_shape, qtype):


Please also add test case in test_subgraph.py

pengzhao-intel · 2019-04-11T03:42:05Z

Please retrigger the CI

pengzhao-intel · 2019-04-12T05:07:00Z

@TaoLv @ciyongch @ZhennanQin please help review again :)

TaoLv

Why the operator is not called quantized_elemwise_add? sum is another operator which is used to accumulate elements of an array.

TaoLv · 2019-04-12T13:47:41Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum-inl.h

+#include <utility>
+#include <vector>
+#include <string>
+#include "../../tensor/elemwise_unary_op.h"


Make sure these headers are used.

remove unnecessary head file

TaoLv · 2019-04-12T13:49:38Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum-inl.h

+
+struct RequantizeSumParam : public dmlc::Parameter<RequantizeSumParam> {
+ dmlc::optional<float> min_calib_range; // min float value calculated from calibration dataset
+ dmlc::optional<float> max_calib_range; // max float value calculated from calibration dataset


Remove comments. I think these two parameters are already described in L43 and L48.

TaoLv · 2019-04-12T13:55:36Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+ auto dataA_mem = in_data[quantized_sum_enum::kDataA].GetMKLDNNData();
+ auto dataB_mem = in_data[quantized_sum_enum::kDataB].GetMKLDNNData();
+ const bool is_dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8)
+ ? true : false;


const bool is_dataA_int8 = (in_data[quantized_sum_enum::kDataA].dtype() == mshadow::kInt8);

TaoLv · 2019-04-12T13:56:52Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+ } else if (out_data[quantized_sum_enum::kOut].dtype() == mshadow::kUint8) {
+ output_data_range = kUint8Range;
+ output_data_type = mkldnn::memory::u8;
+ }


add else clause.

TaoLv · 2019-04-12T14:02:11Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+ output_min = 0 - output_max;
+ }
+
+ std::vector<float> scales;


Suggest:

// scale 0 is for data A, scale 1 is for data B std::vector<float> scales(2);

TaoLv · 2019-04-12T14:04:19Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+ if (is_dataA_int8 == true) {
+ u8_reorder_scale = out_data_scale/B_scale;
+ scales.push_back(out_data_scale/A_scale);
+ scales.push_back(1);


scales[0] = out_data_scale / A_scale; scales[1] = 1.0f;

TaoLv · 2019-04-12T14:07:29Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+ }
+ mkldnn::memory::format i_fmt = static_cast<mkldnn::memory::format>(
+ in_pds[quantized_sum_enum::kDataA].desc().data.format);
+ auto output_desc = memory::desc(i_dims, output_data_type, i_fmt);


mkldnn::memory::desc

TaoLv · 2019-04-12T14:09:12Z

src/operator/quantization/mkldnn/mkldnn_quantized_sum.cc

+NNVM_REGISTER_OP(_contrib_quantized_sum)
+.set_attr<FInferStorageType>("FInferStorageType", SumStorageType)
+.set_attr<FComputeEx>("FComputeEx<cpu>", MKLDNNQuantizedSumForward)
+.set_attr<FResourceRequest>("FResourceRequest", [](const NodeAttrs& n) {


Need resource?

TaoLv · 2019-04-12T14:11:45Z

src/operator/quantization/quantized_sum.cc

+}
+
+NNVM_REGISTER_OP(_contrib_quantized_sum)
+.describe(R"code(elem_add operator for input dataA and input dataB data type of int8,


elem_add?

change to elemwise_add

rongzha1 · 2019-04-13T13:53:35Z

Why the operator is not called quantized_elemwise_add? sum is another operator which is used to accumulate elements of an array.

has changed from quantized_sum to quantized_elemwise_add

pengzhao-intel

LGTM

pengzhao-intel · 2019-04-18T01:24:16Z

@TaoLv @ciyongch @ZhennanQin please help review the change again.

TaoLv

Some minor comments. Please fix parameter indents after function names are changed,

TaoLv · 2019-04-18T09:18:52Z

src/operator/quantization/mkldnn/mkldnn_quantized_elemwise_add.cc

+}
+
+static void MKLDNNQuantizedElemwiseAddForward(const nnvm::NodeAttrs& attrs, const OpContext& ctx,
+ const std::vector<NDArray>& in_data,


please fix indent.

TaoLv · 2019-04-18T09:19:20Z

src/operator/quantization/mkldnn/mkldnn_quantized_elemwise_add.cc

+ // A, B, A_min, A_max, B_min, B_max
+ CHECK_EQ(in_data.size(), 6U);
+ // C, C_min, C_max
+ CHECK_EQ(out_data.size(), 3U);


Please add some descriptive message for these two checks.

I meant the error message if the check is failed~

TaoLv · 2019-04-18T09:21:03Z

src/operator/quantization/mkldnn/mkldnn_quantized_elemwise_add.cc

+ if (params.max_calib_range.has_value() && params.min_calib_range.has_value()) {
+ output_min = params.min_calib_range.value();
+ output_max = params.max_calib_range.value();
+ out_data_scale = output_data_range/MaxAbs(output_min, output_max);


Add spaces before and after /.

TaoLv · 2019-04-18T09:22:22Z

src/operator/quantization/mkldnn/mkldnn_quantized_elemwise_add.cc

+ float u8_reorder_scale = 0;
+ if (params.max_calib_range.has_value() && params.min_calib_range.has_value()) {
+ if (is_dataA_int8 == true) {
+ u8_reorder_scale = out_data_scale/B_scale;


TaoLv · 2019-04-18T09:25:36Z

src/operator/quantization/quantized_elemwise_add.cc

+namespace op {
+
+static bool ElemwiseAddShape(const nnvm::NodeAttrs& attrs, mxnet::ShapeVector* in_shape,
+ mxnet::ShapeVector* out_shape) {


Please fix indent.

TaoLv · 2019-04-18T09:28:20Z

tests/python/quantization/test_quantization.py

+
+ fp32_rslt = output.asnumpy()
+ int8_rslt = qoutput.asnumpy()*max_val/0x7fffffff
+ assert_almost_equal(int8_rslt, int8_rslt, atol = 1)


why choose atol=1?

TaoLv · 2019-04-24T14:31:03Z

src/operator/quantization/mkldnn/mkldnn_quantized_elemwise_add.cc

+ // A, B, A_min, A_max, B_min, B_max
+ CHECK_EQ(in_data.size(), 6U);
+ // C, C_min, C_max
+ CHECK_EQ(out_data.size(), 3U);


I meant the error message if the check is failed~

TaoLv · 2019-04-24T14:37:34Z

src/operator/quantization/mkldnn/mkldnn_quantized_elemwise_add.cc

+ scales[0] = out_data_scale / A_scale;
+ scales[1] = out_data_scale / B_scale;
+ } else {
+ scales[0] = dataA_absmax*output_data_range / ((dataA_absmax + dataB_absmax)*dataA_range);


nit: please also add spaces around operation *.

TaoLv · 2019-04-24T14:43:15Z

src/operator/quantization/quantized_elemwise_add.cc

+.set_attr<FCompute>("FCompute<cpu>", QuantizedElemwiseAddForward)
+.set_attr<FNeedRequantize>("FNeedRequantize", [](const NodeAttrs& attrs) { return true; })
+.add_argument("lhs", "NDArray-or-Symbol", "first input")
+.add_argument("rhs", "NDArray-or-Symbol", "4th input")


Does 4th input mean the order of parameter list when users call this operator? Seems it does not align with the order of FListInputNames.

TaoLv · 2019-04-24T14:48:15Z

CI is not passed yet~ Please take a look. Thank you. @rongzha1

TaoLv · 2019-04-26T04:41:28Z

src/operator/quantization/quantized_elemwise_add.cc

+.set_attr<FNeedRequantize>("FNeedRequantize", [](const NodeAttrs& attrs) { return true; })
+.add_argument("lhs", "NDArray-or-Symbol", "first input")
+.add_argument("rhs", "NDArray-or-Symbol", "second input")
+.add_argument("lhs_min", "NDArray-or-Symbol", "second input")


should be third?

TaoLv · 2019-04-26T05:13:59Z

src/operator/quantization/mkldnn/mkldnn_quantized_elemwise_add.cc

+.set_attr<FInferStorageType>("FInferStorageType", ElemwiseAddStorageType)
+.set_attr<FComputeEx>("FComputeEx<cpu>", MKLDNNQuantizedElemwiseAddForward)
+.set_attr<bool>("TIsMKLDNN", true)
+.set_attr_parser(ParamParser<RequantizeElemwiseAddParam>)


It's quantize in the operator name but requantize in the param name. Is it intentional?

Yes. this is for fusion with requantized

ZhennanQin

Overall LGTM. Just minor comment.

ZhennanQin · 2019-04-26T06:32:25Z

src/operator/quantization/quantized_elemwise_add.cc

+ }
+ // C
+ int dtype = mshadow::kInt32;
+#if MXNET_USE_MKLDNN == 1


This isn't a feature of mkldnn. Consider to remove this macro.

TaoLv

Thanks for addressing the comments. Now it's approved.

pengzhao-intel · 2019-04-30T21:55:56Z

Finally, the CI pass. Thanks for the contribution :)

Merging now.

* add quantized sum * fix gpu compiler error and cpu testcase fail * add default forward function for quantized_sum * skip quantized_sum for gpu ctx * fix comments * fix indetation and comments * retrigger CI * alloc memeory through TmpMemMgr * fix comments Apr.12 * change sum to elemwise_add * change Sum to ElemwiseAdd * fix indents * retrigger CI * trigger CI * fix indentation and typo * trigger CI * fix typo * fix typo * remove USE_MKLDNN macro for requantize params * rename param same as its op * trigger CI * trigger CI * trigger CI

add quantized sum

d928ef4

pengzhao-intel changed the title ~~add quantized sum~~ [MKLDNN] add quantized sum Apr 4, 2019

marcoabreu added MKLDNN Quantization Issues/Feature Requests related to Quantization labels Apr 4, 2019

TaoLv reviewed Apr 4, 2019

View reviewed changes

pengzhao-intel mentioned this pull request Apr 5, 2019

[Discussion] 1.5.0 Roadmap #14619

Closed

rongzha1 added 3 commits April 7, 2019 23:11

fix gpu compiler error and cpu testcase fail

45d831f

add default forward function for quantized_sum

fe60be3

skip quantized_sum for gpu ctx

b90de11

TaoLv reviewed Apr 8, 2019

View reviewed changes

fix comments

b2c6b07

ciyongch reviewed Apr 9, 2019

View reviewed changes

rongzha1 added 3 commits April 11, 2019 13:46

fix indetation and comments

18c7283

retrigger CI

659a002

Merge remote-tracking branch 'origin/master' into rong_int8_pr

1f20274

alloc memeory through TmpMemMgr

e8e580b

TaoLv reviewed Apr 12, 2019

View reviewed changes

triplekings and others added 2 commits April 13, 2019 19:04

fix comments Apr.12

c96103f

change sum to elemwise_add

4a4556b

pengzhao-intel approved these changes Apr 16, 2019

View reviewed changes

change Sum to ElemwiseAdd

f156005

TaoLv reviewed Apr 18, 2019

View reviewed changes

triplekings and others added 3 commits April 23, 2019 10:41

Merge remote-tracking branch 'origin' into rong_int8_pr

11a6206

trigger CI

4ddf2c7

Merge remote-tracking branch 'origin' into rong_int8_pr

4e5b586

TaoLv reviewed Apr 24, 2019

View reviewed changes

rongzha1 added 2 commits April 25, 2019 09:31

Merge remote-tracking branch 'origin' into rong_int8_pr

a444555

fix indentation and typo

89c30a3

szha added this to Review in progress in CPU Performance and Quantization Apr 25, 2019

szha moved this from Review in progress to In progress in CPU Performance and Quantization Apr 25, 2019

pengzhao-intel moved this from In progress to Review in progress in CPU Performance and Quantization Apr 26, 2019

rongzha1 added 2 commits April 26, 2019 09:45

trigger CI

9cb8bbe

fix typo

e55b27b

TaoLv reviewed Apr 26, 2019

View reviewed changes

fix typo

fa3d1e4

ZhennanQin reviewed Apr 26, 2019

View reviewed changes

TaoLv approved these changes Apr 26, 2019

View reviewed changes

CPU Performance and Quantization automation moved this from Review in progress to Reviewer approved Apr 26, 2019

rongzha1 added 8 commits April 28, 2019 10:15

remove USE_MKLDNN macro for requantize params

11cd34a

rename param same as its op

c18eeec

Merge remote-tracking branch 'origin' into rong_int8_pr

c3ef05d

Merge remote-tracking branch 'origin' into rong_int8_pr

45d914a

trigger CI

34bec4d

Merge remote-tracking branch 'origin' into rong_int8_pr

3d5c2e7

trigger CI

440a7a5

trigger CI

3e6762e

pengzhao-intel merged commit 84c1635 into apache:master Apr 30, 2019

CPU Performance and Quantization automation moved this from Reviewer approved to Done Apr 30, 2019


		DMLC_REGISTER_PARAMETER(RequantizeSumParam);

		static float GetScale(const NDArray& data, float min, float max) {

[MKLDNN] add quantized sum #14614

[MKLDNN] add quantized sum #14614

Conversation

rongzha1 commented Apr 4, 2019

Description

Checklist

Essentials

Changes

pengzhao-intel commented Apr 4, 2019

pengzhao-intel commented Apr 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pengzhao-intel commented Apr 11, 2019

pengzhao-intel commented Apr 12, 2019

TaoLv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rongzha1 commented Apr 13, 2019

pengzhao-intel left a comment

Choose a reason for hiding this comment

pengzhao-intel commented Apr 18, 2019

TaoLv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TaoLv commented Apr 24, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment