[MXNET-560][WIP] Add temperature parameter in Softmax and SoftmaxOutput operator #11356

apeforest · 2018-06-21T18:34:18Z

Description

This PR is to address the request to have a native temperature parameter in the softmax functions. See Issue for more detailed discussion.

I have added the temperature parameter to both softmax and SoftmaxOutput operators. By default the temperature parameter value is 1.0 and both functions should behave the same as not specifying the temperature parameter.

Verified the change using the following code in Python:

import mxnet as mx

data = mx.sym.Variable('data')
net = mx.sym.softmax(data=data, temperature=10)

x = mx.nd.array([ 1,  2,  3])

ex = net.bind(mx.cpu(), args={'data': x, 'softmax2_label': 'softmax2'})
ex.forward()

should expect return

[
[ 0.30060961  0.33222499  0.3671654 ]
 <NDArray 3 @cpu(0)>]

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

In the implementation of softmax function, I added a check for the default case where temperature equals 1.0 to remove the unnecessary divide-by-1 operation in the calculation. I have run an offline experiment of softmax function alone compiled with g++ -O3. The single core runtime can be reduced by around 25% with this optimization .

apeforest · 2018-06-21T18:36:19Z

@szha @eric-haibin-lin @sandeep-krishnamurthy [WIP] Only added temperature parameter to softmax but not SoftmaxOutput yet since the latter depends on operator change in another repo (MShadow). Please review the change for softmax function. Thanks

eric-haibin-lin · 2018-06-21T20:08:09Z

@slitsey

eric-haibin-lin

pls add unit tests in test_operator.py

apeforest · 2018-06-21T20:17:24Z

@eric-haibin-lin Will do once I send out the PR for merge. Since this is my first contribution, I would like to get some initial review on my code change to make sure I am following the proper styles and convention in this community.

eric-haibin-lin · 2018-06-21T20:55:37Z

Yeah because my initial review always involves inspecting tests :)

rahul003 · 2018-06-21T21:00:33Z

src/operator/nn/softmax-inl.h

 __syncthreads();
 cuda::Reduce1D<red::sum, x_bits>(smem);
 __syncthreads();
 DType ssum = smem[0];
 __syncthreads();

- for (index_t i = x; i < M; i += x_size) {
- out[base + i*sa] = OP::Map(in[base + i*sa] - smax, ssum);
+ if (temperature == 1.0) {


I'd suggest removing the if condition and merging these two cases, since it's perfectly okay to use the second equation for temperature=1. The if condition within this kernel causes branching and affect performance

I added this if condition for performance concern. I'd assume (correct me if I'm wrong) 90% of cases the temperature to softmax is set to 1. Adding a dividep-by-1.0 operation to expf((in[base + i*sa] - smax)/t) will slow down this computation and I am not aware any compiler can optimize away this.

The division (even if not optimized away) should be better than causing a branch. Especially on the GPU branch divergence can have a significant impact on performance. It's better to avoid it.

I agree with you about the overhead of the branching. However, there is a trade off in performance over complexity here. It really depends on how critical this piece of computation is to the overall performance of the network. In this case, I would assume softmax is a function called very often in the last layer of a neural network. But I am not an expert to oversee the performance impact.

Just noticed the comment in your post. Looks like you've tested it and found it faster like this. Please verify that in a range of scenarios, and we can do that then.
I'm a bit surprised because that has not been my experience with a different kernel.
And btw IIRC softmax operator already has some performance issues and needs to be improved.

@apeforest Could you provide data on which version is faster? We have test_speed function: https://github.com/apache/incubator-mxnet/blob/a7952f0b3218363a9520aa606f43db94a34c55b8/python/mxnet/test_utils.py#L1133 ready for you to test the speed.

rahul003 · 2018-06-21T21:01:50Z

src/operator/nn/softmax-inl.h

@@ -127,7 +137,7 @@ inline void SoftmaxGrad(Stream<cpu> *s, DType *out, DType *ograd,
 #ifdef __CUDACC__
 template<int x_bits, typename OP, typename DType, int ndim>
 __global__ void softmax_compute_kernel(DType *in, DType *out, index_t M, int axis,
- Shape<ndim> sshape, Shape<ndim> stride) {
+ Shape<ndim> sshape, Shape<ndim> stride, float temperature) {


Please try to use const for variables which don't change

This is pass-by-value so it does not make any difference.

Just a matter of convention/style/readability https://google.github.io/styleguide/cppguide.html#Use_of_const

Thanks for your suggestion. But I do not seem to find the convention of using const for pass-by-value parameter in the Google style guide:

If a function guarantees that it will not modify an argument passed by reference or by pointer, the corresponding function parameter should be a reference-to-const (const T&) or pointer-to-const (const T*), respectively.

In fact, adding unnecessary const declaration will put restriction on the caller of the library function.

adding const qualifier could be a safe net by explicitly telling the compiler about your assumption that you're not changing the input value at all in the function.
Say if you have a function like this:

void foo(int a) { return a; }

Here your assumption that the original input should be returned without any changes, and that is essential for correct behavior of this function.
Now if someone happens to change the function to:

void foo(int a) { a++; // <- new code that happens to change the value of a, and will affect correctness return a; }

compiler will not complain about it as you do not have a const qualifier here, so an extra const qualifier is not necessary, but it could be helpful.

I have added const qualifier following your suggestion.

* Initial commit * Add coverage generation to all python tests * Restrict package and add branch coverage * Add sanity for test * Revert "Add sanity for test" This reverts commit 5d86bd7. * Delete reference to unexistant file * Add bot configuration * Enable coverage for GCC * Revert "Enable coverage for GCC" This reverts commit ae5ecff.

…pache#11333) * fix param init bug and remove memcpy/memset * fix bug for bidirection size * add 100 times loop for test_gru_bidirectional robust checking * add test_loop_gru_bidirectional * record number of passed * remove 1000 times case

…bute 'value'' on distributed processing applications (apache#11332) * add scope to NameManager * add AttrScope scope * adding test * update NameManager * Trigger build * Trigger build * Add attribute checks for register module

…apache#11381) * flaky test disable test_ImageRecordIter_seed_augmentation temporarily * test deconv relax

…he#11355) * Support dense weight and sparse grad AdagradUpdate * Simplify AdagradStorageType * Add test

* Resolve conflicts * Export module Test Framework * refactoring export to work with pretrained models * comments added * 1. Refactored export module. 2. Refactored test framework to support ONNX backened tests. 2. Added Operator support: - Convolution2D - BatchNorm - Add * Added Arithmetic operators: - Add, Sub, Mul, Div, Sum * Added operator support: - sigmoid, relu, pad( constant, edge, reflect), tanh - enabled corresponding ONNX backend tests. * Enabled ONNX tests: test_conv, test_basic_conv Added Operators : Ceil, Floor * Added support for: MaxPool, AvgPool, GlobalMaxPool, GlobalAvgPool, matmul * adding more operators * Added Operator support: ArgMax, ArgMin, maximum, minimum * Enabled more BASIC_MODEL tests * Added power operator tests * Added support for reshape. ONNX only supports 0, -1 special values. Added only for these. Fixed logic error with convert_string_to_list() * some tests enabled * enabling squeezenet * LRN Op support * mul_scalar modified to take scalar input * cleaning some code * Resolving conlicts on rebase * Resolving rebase conflicts * id mapping updated for all operators * save onnx models added, some code cleanup * enabled more tests * conv pad calc fixed * reshape op fix * Added support for elu, leakyRelu, prelu * Cleanup - Removed run_node, not needed anymore. - Used correct get_metadata api * valueinfoproto fix, googlenet test added * Removed redundant code. - run_node - Using correct get_metadata_api * dilation added * Lint fixes * lint fixes * some fixes to make export work with onx1.2.1 * enabled more tests * mxnet_export_test file added * duplicate file deleted * reduce ops added * some small fixes * some lint fixes * Add tests for inception_v1 and inception_v2 * Add CI runs for export module * docstring added * lint fixes, pooling attr fix * fix * fix global_pool * CI run fix * code cleanup * lint fix * some code cleanup * pad in pooling added * slicechannel notimplementederror raised * Added required license comments * Lint fixes * lint fix * lint fix * lint fix * lint fix * Correct license statement * Adding onnx a runtime dependency * Fix import module error for string_types * Making ONNX runtime dependency * fixing some comments * addressing some comments * params rename * lint fixes * fixes * spatial disabled, path fixed * fixing some comments * Added support for remaining act_type(softsign, sigmoid, softrelu) in Activation operator * changing import * adding some comments * Add squeeze op * Refactored logic to handle extra node(output label node) for saved mxnet model Added comments * minor fix for squeeze operator. Also, added error handling * identity operator added * scalar ops added * Renamed onnx support folders to mark it public folders Changed underline files public or private as per usage Resolved conflicts with the latest * Added support L2Normalization op Added some error checking * added comments and warning * added comments and warning * doc API ref added

* Add test result publishing to windows * Fix names of files * Fix syntax of xcopy on Windows

…) (apache#11391) * Dont fail during artifact storage * Update Jenkinsfile * Update Jenkinsfile

* Update test_gluon_trainer.py * Update test_gluon_trainer.py * Update test_gluon_trainer.py * Update test_gluon_trainer.py * Update test_gluon_trainer.py * trigger * Run 100000 times * Update test_gluon_trainer.py * run 10K times * test_trainer_reset_kv didn't fail for 10K time . 2nd Trigger. * test_trainer_reset_kv didn't fail for 10K times. 3rd Trigger. * remove for loop

* fix recordfile dataset with multi worker * fix another test * fix

…pache#11259) Use float64 computations as the reference numpy implementation operates in double and not float. f64(f32(f64(.))) % f64(f32(f64(.))) is not the same as f64(.) % f64(.) due to limited precission. fixes apache#9853

* implementation of histogram operator * address code reviews and code re-design * add exception for invalid inputs * address code reviews * add symbol and symbolic forward check for histogram

@KellenSunderland

* Added two tutorials on learning rate schedules; basic and advanced. * Correcting notebook skip line. * Corrected cosine graph * Changes based on @KellenSunderland feedback.

* refactor copyfrom * add boilerplate * rename to MKLDNNCopy * write to temp memory * reorder mkldnn / views * return memory from GetMKLDNNData * add kaddto to unit test * move orig output before creatingnewmem * coerce memory if shape does not fit * use MKLDNNCopy in commit * uncomment addto test * switch order of mkldnnsum params * improving logging * wait to read after copying arr * remove extra white spaces * remove extra white space * remove unused var * reorder output * do not write to views * remove shape check in test * use input pdesc * remove unused var * fix merge * put inplace in separate loop * use two mem * use sum_pd when calling CreateMKLDNNData * reorder sum shapes if needed * comment out getsumpd * use MKLDNNCopy helper to reshape mem * remove getsumpd * use output mem for createmem * remove todo * waittoread output * do not attempt to shape output * use correct arr as input * revert commit change to ps-lite * revert change to tvm * fix lint * add comment to test * reduce calls to get_primitive_desc * skip tests that reorder2default * push_back to inputs * skip if view/mkldnn * add noop test * pass input ptr for write in place * allow empty

* Fix bi-lstm-crf to update crf weights * Use self.params.get to declare params

…e#10391) * dtype for data, working fp16 * test dtype fp16 gluon * add gluon fine tuning code * data iter caltech * caltech iter * working finetuning for fp16, but is it using pretrained params * benchmark fp16 * add wip tutorials * working notebook fp16 * changes to symbolic examples * changes to symbolic examples * add fp16 notebook * remove extra files * remove output of notebook * update md file * remove from faq * dtype for data, working fp16 * test dtype fp16 gluon * add gluon fine tuning code * data iter caltech * caltech iter * working finetuning for fp16, but is it using pretrained params * benchmark fp16 * add wip tutorials * working notebook fp16 * changes to symbolic examples * changes to symbolic examples * add fp16 notebook * remove extra files * remove output of notebook * update md file * remove from faq * WIP address feedback * gluon example * add top5 back * clean up gluon example * address feedback * address comments * move tutorial to faq * Add training curves * formatting * update image * trigger ci

…1340) * Added scripts for broken link checker job * Corrected the order in which scripts are copied to mxnet-site repo * Added the echo statements and trying to install the utilities in docker using sudo * Trying npm installation without sudo * Creating docker file npm installation * Creating ubuntu_blc docker image in JenkinsfileForBLC * Copying the url_list.txt from s3 bucket * Trying to copy url_list.txt from s3 bucket * Trying to copy file from aws outside container * Updated the scripts with the right relative path to download the url_list.txt * Removed the exit 1 status that can cause to stop the job before saving the file to S3 * Added the README.md file * Addressed the review comments * added the right parameters to docker_run * Removed the reference to ubuntu_ccache.sh file which no longer exists * Added the license header to the files * Trying the logic to mark the build as failed when regression is detected * Corrected the path in aws cp command at the end of the job. * Adding the finally block to stage * Addressed the review comments

…NN=1' issue (apache#11090) * Define build target for mkldnn lib build to fix 'make clean USE_MKMLDNN=1' issue * fix create install dir and other minor issues * Fix GPU MKLDNN and cpp-package build failure * Fix issue to only link with full MKL when BLAS is mkl * simplify logic by removing MKLDNN_ROOT support and some renaming * retrigger Jenkins * retrigger Jenkins * retrigger Jenkins * retrigger Jenkins * retrigger Jenkins

* fix url * fix url * fix url * Update compare_layers.py * Update compare_layers.py * Update test_converter.py * Update compare_layers.py * Update compare_layers.py * Update compare_layers.py

…st/incubator-mxnet into feature/enhance_operator

apeforest · 2018-06-28T00:00:16Z

@eric-haibin-lin unit test added this time :)

haojin2 · 2018-06-28T00:23:11Z

@apeforest please do a rebase properly, you're including a lot of committed changes.
Please refer to https://cwiki.apache.org/confluence/display/MXNET/MXNet+Development+Guide for guide on how to rebase properly

…/apeforest/incubator-mxnet into feature/enhance_operator" This reverts commit 70f3caf, reversing changes made to 63a60f3.

…thub.com/apeforest/incubator-mxnet into feature/enhance_operator"" This reverts commit 84d40bc.

apeforest · 2018-06-28T16:25:46Z

The commits have been messed up because my earlier PR was checked out from a different branch. I guess I will create a new PR with the changes and send out for review. Sorry for the inconvenience.

eric-haibin-lin · 2018-07-02T06:54:02Z

Moved to #11466

apeforest added 7 commits June 21, 2018 09:58

Add temperature parameter in softmax operator

79df2ee

Use float type for temperature argument

0704d11

Add temperature parameter for the CUDA implementation of softmax

8391971

Optimize softmax computation when temperature is default 1.0

f050f04

Fix a compilation error by wrong variable name

b53a4bf

Fix lint error

36a42c0

remove file committed by mistake

13c98f5

eric-haibin-lin reviewed Jun 21, 2018

View reviewed changes

Fix a dependency of Softmax in CUDA

63a60f3

rahul003 suggested changes Jun 21, 2018

View reviewed changes

marcoabreu and others added 16 commits June 22, 2018 01:31

Removing the stale links from docs/index.md (apache#11372)

10100b8

flaky test disable test_ImageRecordIter_seed_augmentation temporarily (…

adec280

…apache#11381) * flaky test disable test_ImageRecordIter_seed_augmentation temporarily * test deconv relax

Enable support for dense weight and sparse grad Adagrad updates (apac…

9b27262

…he#11355) * Support dense weight and sparse grad AdagradUpdate * Simplify AdagradStorageType * Add test

[MXNET-538] Add XUnit test result publishing to windows (apache#11348)

225f71f

* Add test result publishing to windows * Fix names of files * Fix syntax of xcopy on Windows

Don't fail storing test results if test suite got aborted (apache#11363…

cdb01fc

…) (apache#11391) * Dont fail during artifact storage * Update Jenkinsfile * Update Jenkinsfile

fix recordfile dataset with multi worker (apache#11370)

72fa6a9

* fix recordfile dataset with multi worker * fix another test * fix

[MXNET-349] Histogram Operator (apache#10931)

ed7e360

* implementation of histogram operator * address code reviews and code re-design * add exception for invalid inputs * address code reviews * add symbol and symbolic forward check for histogram

[MXNET-593] Adding 2 tutorials on Learning Rate Schedules (apache#11296)

e494cee

* Added two tutorials on learning rate schedules; basic and advanced. * Correcting notebook skip line. * Corrected cosine graph * Changes based on @KellenSunderland feedback.

add vRNN and dropout (apache#11399)

0538ad9

anbrjohn and others added 16 commits June 26, 2018 11:46

Fix bi-lstm-crf to update crf weights (apache#11375)

4db5a83

* Fix bi-lstm-crf to update crf weights * Use self.params.get to declare params

Fix flaky test test_operator_gpu.test_batchnorm_with_type (apache#11396)

ced119b

[MXNET-601] Fix caffe converter test (apache#11425)

2885370

* fix url * fix url * fix url * Update compare_layers.py * Update compare_layers.py * Update test_converter.py * Update compare_layers.py * Update compare_layers.py * Update compare_layers.py

Add temperature parameter in softmax operator

7097b66

Use float type for temperature argument

8b9825d

Add temperature parameter for the CUDA implementation of softmax

8311061

Optimize softmax computation when temperature is default 1.0

c8381bf

Fix a compilation error by wrong variable name

9de3ffe

Fix lint error

c2d0d9b

remove file committed by mistake

4011013

Fix a dependency of Softmax in CUDA

f9b3eeb

Add a unit test and add const to argument suggested by reviewer

bf933d6

Merge branch 'feature/enhance_operator' of https://github.com/apefore…

70f3caf

…st/incubator-mxnet into feature/enhance_operator

apeforest requested review from marcoabreu and szha as code owners June 27, 2018 23:58

apeforest added 2 commits June 28, 2018 00:15

Revert "Merge branch 'feature/enhance_operator' of https://github.com…

84d40bc

…/apeforest/incubator-mxnet into feature/enhance_operator" This reverts commit 70f3caf, reversing changes made to 63a60f3.

Revert "Revert "Merge branch 'feature/enhance_operator' of https://gi…

d1c7e68

…thub.com/apeforest/incubator-mxnet into feature/enhance_operator"" This reverts commit 84d40bc.

eric-haibin-lin closed this Jul 2, 2018

apeforest deleted the feature/enhance_operator branch July 2, 2018 18:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-560][WIP] Add temperature parameter in Softmax and SoftmaxOutput operator #11356

[MXNET-560][WIP] Add temperature parameter in Softmax and SoftmaxOutput operator #11356

apeforest commented Jun 21, 2018 •

edited

Loading

apeforest commented Jun 21, 2018 •

edited

Loading

eric-haibin-lin commented Jun 21, 2018

eric-haibin-lin left a comment

apeforest commented Jun 21, 2018

eric-haibin-lin commented Jun 21, 2018

rahul003 Jun 21, 2018 •

edited

Loading

apeforest Jun 21, 2018

rahul003 Jun 21, 2018

apeforest Jun 21, 2018

rahul003 Jun 23, 2018 •

edited

Loading

haojin2 Jun 25, 2018

rahul003 Jun 21, 2018

apeforest Jun 21, 2018

rahul003 Jun 21, 2018 •

edited

Loading

apeforest Jun 21, 2018

haojin2 Jun 25, 2018

apeforest Jun 27, 2018

apeforest commented Jun 28, 2018

haojin2 commented Jun 28, 2018 •

edited

Loading

apeforest commented Jun 28, 2018

eric-haibin-lin commented Jul 2, 2018

[MXNET-560][WIP] Add temperature parameter in Softmax and SoftmaxOutput operator #11356

[MXNET-560][WIP] Add temperature parameter in Softmax and SoftmaxOutput operator #11356

Conversation

apeforest commented Jun 21, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

apeforest commented Jun 21, 2018 • edited Loading

eric-haibin-lin commented Jun 21, 2018

eric-haibin-lin left a comment

Choose a reason for hiding this comment

apeforest commented Jun 21, 2018

eric-haibin-lin commented Jun 21, 2018

rahul003 Jun 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahul003 Jun 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahul003 Jun 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apeforest commented Jun 28, 2018

haojin2 commented Jun 28, 2018 • edited Loading

apeforest commented Jun 28, 2018

eric-haibin-lin commented Jul 2, 2018

apeforest commented Jun 21, 2018 •

edited

Loading

apeforest commented Jun 21, 2018 •

edited

Loading

rahul003 Jun 21, 2018 •

edited

Loading

rahul003 Jun 23, 2018 •

edited

Loading

rahul003 Jun 21, 2018 •

edited

Loading

haojin2 commented Jun 28, 2018 •

edited

Loading