fix ssd quantization script error #13843

ciyongch · 2019-01-11T02:15:19Z

Description

This PR is to address an error of "symbol not found" during ssd quantization process.
Since there're some Op outputs are not in the format of {opname}_output, which results in no calibrated min/max for such variables in quantized param file, and cause the error.
Leave the calib_layer to None (which is default) is a safety way to do the calibration.

@pengzhao-intel @TaoLv @ZhennanQin @xinyu-intel @apeforest

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

TaoLv · 2019-01-11T02:23:34Z

Thank you for the quick fix @ciyongch . Do you mind pasting the error log here?

xinyu-intel

small changes

xinyu-intel · 2019-01-11T02:31:03Z

example/ssd/quantization.py

@@ -128,6 +127,8 @@ def save_params(fname, arg_params, aux_params, logger=None):
 if exclude_first_conv:
 excluded_sym_names += ['conv1_1']

+ excluded_sym_names += ['multibox_loc_pred', 'concat0', 'concat1']


add to line124

xinyu-intel · 2019-01-11T02:35:08Z

Would you mind also help me fix a typo in ssd readme?

data/
|---val.rec
|---val.lxt
|---val.idx
model/
|---ssd_vgg16_reduced_300-0000.params  <-----
|---ssd_vgg16_reduced_300-symbol.json

ciyongch · 2019-01-11T02:43:48Z

@TaoLv, error log is as below:

$ python evaluate.py --cpu --num-batch 10 --batch-size 224 --deploy --prefix=./model/cqssd_
[17:17:21] src/io/iter_image_det_recordio.cc:283: ImageDetRecordIOParser: /home/ubuntu/mxnet/example/ssd/data/val.rec, use 32 threads for decoding..
[17:17:21] src/io/iter_image_det_recordio.cc:340: ImageDetRecordIOParser: /home/ubuntu/mxnet/example/ssd/data/val.rec, label padding width: 254
[17:17:23] src/operator/subgraph/mkldnn/mkldnn_conv_property.cc:138: Start to execute MKLDNN Convolution optimization pass.
Traceback (most recent call last):
  File "evaluate.py", line 108, in <module>
    voc07_metric=args.use_voc07_metric)
  File "/home/ubuntu/mxnet/example/ssd/evaluate/evaluate_net.py", line 107, in evaluate_net
    mod.set_params(args, auxs, allow_missing=False, force_init=True)
  File "/home/ubuntu/mxnet/python/mxnet/module/module.py", line 350, in set_params
    allow_extra=allow_extra)
  File "/home/ubuntu/mxnet/python/mxnet/module/module.py", line 309, in init_params
    _impl(desc, arr, arg_params)
  File "/home/ubuntu/mxnet/python/mxnet/module/module.py", line 300, in _impl
    raise RuntimeError("%s is not presented" % name)
RuntimeError: flatten10_0_max is not presented

Thank you for the quick fix @ciyongch . Do you mind pasting the error log here?

ciyongch · 2019-01-11T02:50:52Z

no problem @xinyu-intel

xinyu-intel · 2019-01-11T04:56:46Z

Thanks:)

pengzhao-intel · 2019-01-11T12:30:40Z

example/ssd/quantization.py

 label_names=(label_name,),
- calib_quantize_op = True,
+ calib_quantize_op=True,
 logger=logger)
 sym_name = '%s-symbol.json' % ('./model/cqssd_vgg16_reduced_300')
 param_name = '%s-%04d.params' % ('./model/cqssd_vgg16_reduced_300', epoch)


Why not remove 0000 in here instead of adding 0000 in the readme? Any special meaning for it?

It's a convention of mxnet. We can save different parameter files at different epochs.

Thanks for the explanation. Agree the change makes sense :)

pengzhao-intel · 2019-01-11T12:41:43Z

example/quantization/README.md

@@ -218,7 +218,7 @@ data/
 |---val.lxt
 |---val.idx
 model/
-|---ssd_vgg16_reduced_300.params
+|---ssd_vgg16_reduced_300-0000.params


please add the note, user needs to rename, such as ssd-val-fc19a535.idx to val.idx.

Another suggestion is to move/copy the SSD quantization instruction into ssd/README too.

TaoLv

LGTM

TaoLv · 2019-01-13T12:22:26Z

@reminisce @ZhennanQin Please take a look.

pengzhao-intel

LGTM for a hotfix.

Regarding other parts of changes I mentioned, I think it can be left to another PR.

ciyongch · 2019-01-14T00:45:48Z

LGTM for a hotfix.

Regarding other parts of changes I mentioned, I think it can be left to another PR.

@pengzhao-intel Good suggestion to update the readme as you mentioned above, no problem to me to update the change in this PR:)

…ADME.md

ciyongch · 2019-01-14T05:14:58Z

@TaoLv @reminisce This PR is only a fix to SSD quantization script in example and the update to README, please help to merge if no other comments :)

TaoLv

Minor comments.

TaoLv · 2019-01-14T05:17:35Z

example/quantization/README.md

@@ -34,7 +34,7 @@ The following models have been tested on Linux systems.
 |[Inception V3](#7)|[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)|[Validation Dataset](http:https://data.mxnet.io/data/val_256_q90.rec)|76.49%/93.10% |76.38%/93% |
 |[ResNet152-V2](#8)|[MXNet ModelZoo](http:https://data.mxnet.io/models/imagenet/resnet/152-layers/)|[Validation Dataset](http:https://data.mxnet.io/data/val_256_q90.rec)|76.76%/93.03%|76.48%/92.96%|
 |[Inception-BN](#9)|[MXNet ModelZoo](http:https://data.mxnet.io/models/imagenet/inception-bn/)|[Validation Dataset](http:https://data.mxnet.io/data/val_256_q90.rec)|72.09%/90.60%|72.00%/90.53%|
-| [SSD-VGG](#10) | [example/ssd](https://github.com/apache/incubator-mxnet/tree/master/example/ssd) | VOC2007/2012 | 0.83 mAP | 0.82 mAP |
+| [SSD-VGG](#10) | [example/ssd](https://github.com/apache/incubator-mxnet/tree/master/example/ssd) | VOC2007/2012 | 0.8366 mAP | 0.8364 mAP |


TaoLv · 2019-01-14T05:17:48Z

example/quantization/README.md

@@ -210,40 +210,7 @@ python imagenet_inference.py --symbol-file=./model/imagenet1k-inception-bn-quant

 <h3 id='10'>SSD-VGG</h3>


no problem, any other description in current README need udpated?

TaoLv · 2019-01-15T07:27:06Z

Thank you for the fix @ciyongch. Merging it now.

* fix ssd quantization script error * update readme for ssd * move quantized SSD instructions from quantization/README.md to ssd/README.md * update ssd readme and accuracy * update readme for SSD-vGG16

TaoLv · 2019-01-15T08:55:19Z

@lifu-wang You may be interested. We will have nightly build with this fix soon.

* fix ssd quantization script error * update readme for ssd * move quantized SSD instructions from quantization/README.md to ssd/README.md * update ssd readme and accuracy * update readme for SSD-vGG16

fix ssd quantization script error

8df9e35

ciyongch requested a review from szha as a code owner January 11, 2019 02:15

TaoLv added Quantization Issues/Feature Requests related to Quantization Example labels Jan 11, 2019

xinyu-intel reviewed Jan 11, 2019

View reviewed changes

update readme for ssd

84be0b9

xinyu-intel approved these changes Jan 11, 2019

View reviewed changes

pengzhao-intel reviewed Jan 11, 2019

View reviewed changes

TaoLv approved these changes Jan 13, 2019

View reviewed changes

pengzhao-intel approved these changes Jan 13, 2019

View reviewed changes

ciyongch added 2 commits January 14, 2019 09:27

move quantized SSD instructions from quantization/README.md to ssd/RE…

65fb13f

…ADME.md

update ssd readme and accuracy

482fae6

TaoLv approved these changes Jan 14, 2019

View reviewed changes

update readme for SSD-vGG16

9be6603

TaoLv added the pr-awaiting-review PR is waiting for code review label Jan 14, 2019

TaoLv merged commit 4fe5461 into apache:master Jan 15, 2019

TaoLv mentioned this pull request Jan 15, 2019

[v1.4.x] fix ssd quantization script error (#13843) #13882

Merged

7 tasks

ciyongch deleted the udpate_ssd_script branch March 13, 2019 02:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix ssd quantization script error #13843

fix ssd quantization script error #13843

ciyongch commented Jan 11, 2019

TaoLv commented Jan 11, 2019

xinyu-intel left a comment

xinyu-intel Jan 11, 2019

ciyongch Jan 11, 2019

xinyu-intel commented Jan 11, 2019

ciyongch commented Jan 11, 2019

ciyongch commented Jan 11, 2019

xinyu-intel commented Jan 11, 2019

pengzhao-intel Jan 11, 2019

TaoLv Jan 11, 2019

pengzhao-intel Jan 11, 2019

pengzhao-intel Jan 11, 2019

pengzhao-intel Jan 11, 2019

TaoLv left a comment

TaoLv commented Jan 13, 2019

pengzhao-intel left a comment

ciyongch commented Jan 14, 2019

ciyongch commented Jan 14, 2019

TaoLv left a comment

TaoLv Jan 14, 2019

TaoLv Jan 14, 2019

ciyongch Jan 14, 2019

TaoLv commented Jan 15, 2019

TaoLv commented Jan 15, 2019

		@@ -210,40 +210,7 @@ python imagenet_inference.py --symbol-file=./model/imagenet1k-inception-bn-quant

		<h3 id='10'>SSD-VGG</h3>

fix ssd quantization script error #13843

fix ssd quantization script error #13843

Conversation

ciyongch commented Jan 11, 2019

Description

Checklist

Essentials

Changes

Comments

TaoLv commented Jan 11, 2019

xinyu-intel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinyu-intel commented Jan 11, 2019

ciyongch commented Jan 11, 2019

ciyongch commented Jan 11, 2019

xinyu-intel commented Jan 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TaoLv left a comment

Choose a reason for hiding this comment

TaoLv commented Jan 13, 2019

pengzhao-intel left a comment

Choose a reason for hiding this comment

ciyongch commented Jan 14, 2019

ciyongch commented Jan 14, 2019

TaoLv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TaoLv commented Jan 15, 2019

TaoLv commented Jan 15, 2019