Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Mxnet-1397] Support symbolic api for requantize and dequantize #14749

Merged
merged 7 commits into from
Apr 24, 2019

Conversation

shoubhik
Copy link
Contributor

Description

If I have a pre-qauntized model from another framework, e.g Tensorflow or Pyrtorch and want to use Mxnet as inference engine, I would like to be able to set the int8 weights, scales and shifts manullay instead of Mxnet converting a model for me. I would like to do so in a nn.HybridBlock. For example I can create a quantized convolution network as below

class QuantizedConv(nn.HybridBlock):
    def __init__(self, 
                 kernel_size, 
                 strides, 
                 num_filter, 
                 kernel_shape,
                 activation=None,
                 dilation=(1, 1),
                 padding=(0, 0),
                 groups=1,
                 use_bias=True,
                 layout='NCHW',
                 adj=None,
                 in_channels=0,
                 op_name='Convolution',
                 weight_initializer=None,
                 weight_min_initializer=None,
                 weight_max_initializer=None,
                 bias_initializer='zeros',
                 weight_shape=None,
                 **kwargs):
        super(QuantizedConv, self).__init__(**kwargs)
        self._kwargs = {
                'kernel': kernel_size, 'stride': strides, 'dilate': dilation,
                'pad': padding, 'num_filter': num_filter, 'num_group': groups,
                'no_bias': not use_bias, 'layout': layout}
        if adj is not None:
            self._kwargs['adj'] = adj
        dshape = [0]*(len(kernel_size) + 2)
        dshape[layout.find('N')] = 1
        dshape[layout.find('C')] = in_channels
        if weight_shape is None:
            self.wshapes = _infer_weight_shape(op_name, dshape, self._kwargs)
        else:
            self.wshapes = weight_shape
        
        self.weight = self.params.get('weight', shape=kernel_shape,
                                          init=weight_initializer,
                                          allow_deferred_init=True)
        self.weight_min = self.params.get('weight_min', shape=(1,),
                                         init=weight_min_initializer,
                                         allow_deferred_init=True)
        self.weight_max = self.params.get('weight_max', shape=(1,),
                                         init=weight_max_initializer,
                                         allow_deferred_init=True)
        
        if use_bias:
            self.bias = self.params.get('bias', shape=self.wshapes[2],
                                            init=bias_initializer,
                                            allow_deferred_init=True)
            self.bias_min = self.params.get('bias_min', shape=(1,),
                                            init=bias_initializer,
                                            allow_deferred_init=True)
            self.bias_max = self.params.get('bias_max', shape=(1,),
                                            init=bias_initializer,
                                            allow_deferred_init=True)
        else:
            self.bias = None
            self.bias_min = None
            self.bias_max = None

        self.kernel_size = kernel_size
        self.strides = strides
        self.num_filter = num_filter
        # for string reprentation
        self.layout = layout
        
        if activation is not None:
            self.act = Activation(activation, prefix=activation+'_')
        else:
            self.act = None

        
    def hybrid_forward(self, F, x, weight, weight_min, weight_max, bias=None, 
                       bias_min=None, bias_max=None):
        if not isinstance(weight, mxnet.ndarray.ndarray.NDArray): 
            print('F={}'.format(F))
            weight_temp = weight.astype(dtype='int8')
            print(weight)
            print(weight_temp)
            print(F.contrib.requantize(weight_temp, weight.min(), weight.max()))
        q_inputs, q_inputs_min, q_inputs_max = F.contrib.quantize(
            data=x, 
            min_range=x.min(), 
            max_range=x.max(),
            out_type='uint8')
        q_output, q_output_min, q_output_max = F.contrib.quantized_conv(
                       data=q_inputs, 
                       weight=weight.astype(dtype ='int8'), 
                       bias=bias.astype(dtype='int8'), 
                       min_data=q_inputs_min, 
                       max_data=q_inputs_max, 
                       min_weight=weight_min,
                       max_weight=weight_max,
                       min_bias=bias_min,
                       max_bias=bias_max, 
                       kernel=self.kernel_size,
                       stride=self.strides,
                       num_filter=self.num_filter, 
                       layout=self.layout,
                       cudnn_off=True, 
                       cudnn_tune='off',
                       name='fwd')
        q_8_out, q_8_out_min, q_8_out_max = F.contrib.requantize(
            data=q_output.astype('int32'), 
            min_range=q_output_min, 
            max_range=q_output_max,
            name='fwd', 
        )
        act = F.contrib.dequantize(
            data=q_8_out, 
            min_range=q_8_out_min, 
            max_range=q_8_out_max,
            name='fwd')
        if self.act is not None:
            act = self.act(act)
        return act
    
    def __repr__(self):
        s = '{name}({mapping}, kernel_size={kernel}, stride={stride}'
        len_kernel_size = len(self._kwargs['kernel'])
        if self._kwargs['pad'] != (0,) * len_kernel_size:
            s += ', padding={pad}'
        if self._kwargs['dilate'] != (1,) * len_kernel_size:
            s += ', dilation={dilate}'
        if hasattr(self, 'out_pad') and self.out_pad != (0,) * len_kernel_size:
            s += ', output_padding={out_pad}'.format(out_pad=self.out_pad)
        if self._kwargs['num_group'] != 1:
            s += ', groups={num_group}'
        if self.weight is None:
            s += ', bias=False'
        if self.act:
            s += ', {}'.format(self.act)
        s += ')'
        shape = self.weight.shape
        return s.format(name=self.__class__.__name__,
                        mapping='{0} -> {1}'.format(shape[1] if shape[1] else None, shape[0]),
                        **self._kwargs)

I can the later set the weight and ranges from my saved model. Currently F.contrib.requantize(....) and F.contrib.dequantize(....) fail when F is a symbol, i.e, after calling hybradize() on the network. For more detailed error example please look at the jira. In this CR I am enabaling requantize and dequantize to be called from symbolic API.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Enable mx.sym.contrib.dequantize to be called directly, tests, (and when applicable, API doc)
  • Enable mx.sym.contrib.requantize to be called directly, tests, (and when applicable, API doc)

Comments

  • This change should be backward compatible.
  • Don't see any edge cases.

sym_min_range = mx.sym.Variable('min_range')
sym_max_range = mx.sym.Variable('max_range')
dequant = mx.sym.contrib.dequantize(sym_data, sym_min_range,
sym_max_range, out_type='float32')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

sym_max_range = mx.sym.Variable('max_range')
if min_calib_range is None or max_calib_range is None:
requant = mx.sym.contrib.requantize(sym_data, sym_min_range, sym_max_range)
out = requant.bind(ctx=mx.cpu(), args={'data':qdata, 'min_range':min_range,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ctx=mx.current_context() so this test can cover both CPU and GPU computation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

sym_max_range = mx.sym.Variable('max_range')
dequant = mx.sym.contrib.dequantize(sym_data, sym_min_range,
sym_max_range, out_type='float32')
out = dequant.bind(ctx=mx.cpu(), args={'data':qdata, 'min_range':min_range,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ctx=mx.current_context() so this test can cover both CPU and GPU computation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

dequant = mx.sym.contrib.dequantize(sym_data, sym_min_range,
sym_max_range, out_type='float32')
out = dequant.bind(ctx=mx.cpu(), args={'data':qdata, 'min_range':min_range,
'max_range':max_range})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

else:
requant = mx.sym.contrib.requantize(sym_data, sym_min_range, sym_max_range,
min_calib_range, max_calib_range)
out = requant.bind(ctx=mx.cpu(), args={'data':qdata, 'min_range':min_range,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ctx=mx.current_context() so this test can cover both CPU and GPU computation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your contribution! LGTM to me overall. I left a few small comments. Please resolve conflict with master and update PR.

@pengzhao-intel
Copy link
Contributor

@@ -84,6 +84,10 @@ by keep zero centered for the quantized value:
.set_attr_parser(ParamParser<DequantizeParam>)
.set_num_inputs(3)
.set_num_outputs(1)
.set_attr<nnvm::FListInputNames>("FListInputNames",
[](const NodeAttrs& attrs) {
return std::vector<std::string>{"data", "min_range", "max_range"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these names will be exposed to front end users, I hope they can align with other quantization operators. In quantized convolution and quantized FC, I see they are min_data and max_data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the names are documented well in most of the quantized ops I think it should be ok. Especially in quantized conv and FC there are too many quantized parameters, I think it is easier to understand the API with min_data and max_data

@@ -61,6 +61,10 @@ inference accuracy.
.set_attr_parser(ParamParser<RequantizeParam>)
.set_num_inputs(3)
.set_num_outputs(3)
.set_attr<nnvm::FListInputNames>("FListInputNames",
[](const NodeAttrs& attrs) {
return std::vector<std::string>{"data", "min_range", "max_range"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above.

Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@TaoLv TaoLv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @shoubhik . LGTM.

@apeforest apeforest merged commit 8604c3c into apache:master Apr 24, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
…he#14749)

* Adding support for symbolic API for requantize and dequantize

* Adding name to contributors list

* Removing redundant code

* Addressing indentation and using current_context() instead of cpu()

* merge from master

* merge from master
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants