Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-560][WIP] Add temperature parameter in Softmax and SoftmaxOutput operator #11356

Closed
wants to merge 42 commits into from

Conversation

apeforest
Copy link
Contributor

@apeforest apeforest commented Jun 21, 2018

Description

This PR is to address the request to have a native temperature parameter in the softmax functions. See Issue for more detailed discussion.

I have added the temperature parameter to both softmax and SoftmaxOutput operators. By default the temperature parameter value is 1.0 and both functions should behave the same as not specifying the temperature parameter.

Verified the change using the following code in Python:

import mxnet as mx

data = mx.sym.Variable('data')
net = mx.sym.softmax(data=data, temperature=10)

x = mx.nd.array([ 1,  2,  3])

ex = net.bind(mx.cpu(), args={'data': x, 'softmax2_label': 'softmax2'})
ex.forward()

should expect return

[
[ 0.30060961  0.33222499  0.3671654 ]
 <NDArray 3 @cpu(0)>]

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • In the implementation of softmax function, I added a check for the default case where temperature equals 1.0 to remove the unnecessary divide-by-1 operation in the calculation. I have run an offline experiment of softmax function alone compiled with g++ -O3. The single core runtime can be reduced by around 25% with this optimization .

@apeforest
Copy link
Contributor Author

apeforest commented Jun 21, 2018

@szha @eric-haibin-lin @sandeep-krishnamurthy [WIP] Only added temperature parameter to softmax but not SoftmaxOutput yet since the latter depends on operator change in another repo (MShadow). Please review the change for softmax function. Thanks

@eric-haibin-lin
Copy link
Member

@slitsey

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add unit tests in test_operator.py

@apeforest
Copy link
Contributor Author

@eric-haibin-lin Will do once I send out the PR for merge. Since this is my first contribution, I would like to get some initial review on my code change to make sure I am following the proper styles and convention in this community.

@eric-haibin-lin
Copy link
Member

Yeah because my initial review always involves inspecting tests :)

__syncthreads();
cuda::Reduce1D<red::sum, x_bits>(smem);
__syncthreads();
DType ssum = smem[0];
__syncthreads();

for (index_t i = x; i < M; i += x_size) {
out[base + i*sa] = OP::Map(in[base + i*sa] - smax, ssum);
if (temperature == 1.0) {
Copy link
Member

@rahul003 rahul003 Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest removing the if condition and merging these two cases, since it's perfectly okay to use the second equation for temperature=1. The if condition within this kernel causes branching and affect performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this if condition for performance concern. I'd assume (correct me if I'm wrong) 90% of cases the temperature to softmax is set to 1. Adding a dividep-by-1.0 operation to expf((in[base + i*sa] - smax)/t) will slow down this computation and I am not aware any compiler can optimize away this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The division (even if not optimized away) should be better than causing a branch. Especially on the GPU branch divergence can have a significant impact on performance. It's better to avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you about the overhead of the branching. However, there is a trade off in performance over complexity here. It really depends on how critical this piece of computation is to the overall performance of the network. In this case, I would assume softmax is a function called very often in the last layer of a neural network. But I am not an expert to oversee the performance impact.

Copy link
Member

@rahul003 rahul003 Jun 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed the comment in your post. Looks like you've tested it and found it faster like this. Please verify that in a range of scenarios, and we can do that then.
I'm a bit surprised because that has not been my experience with a different kernel.
And btw IIRC softmax operator already has some performance issues and needs to be improved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apeforest Could you provide data on which version is faster? We have test_speed function: https://github.com/apache/incubator-mxnet/blob/a7952f0b3218363a9520aa606f43db94a34c55b8/python/mxnet/test_utils.py#L1133 ready for you to test the speed.

@@ -127,7 +137,7 @@ inline void SoftmaxGrad(Stream<cpu> *s, DType *out, DType *ograd,
#ifdef __CUDACC__
template<int x_bits, typename OP, typename DType, int ndim>
__global__ void softmax_compute_kernel(DType *in, DType *out, index_t M, int axis,
Shape<ndim> sshape, Shape<ndim> stride) {
Shape<ndim> sshape, Shape<ndim> stride, float temperature) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to use const for variables which don't change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pass-by-value so it does not make any difference.

Copy link
Member

@rahul003 rahul003 Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a matter of convention/style/readability https://google.github.io/styleguide/cppguide.html#Use_of_const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion. But I do not seem to find the convention of using const for pass-by-value parameter in the Google style guide:

If a function guarantees that it will not modify an argument passed by reference or by pointer, the corresponding function parameter should be a reference-to-const (const T&) or pointer-to-const (const T*), respectively.

In fact, adding unnecessary const declaration will put restriction on the caller of the library function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding const qualifier could be a safe net by explicitly telling the compiler about your assumption that you're not changing the input value at all in the function.
Say if you have a function like this:

void foo(int a) {
  return a;
}

Here your assumption that the original input should be returned without any changes, and that is essential for correct behavior of this function.
Now if someone happens to change the function to:

void foo(int a) {
  a++;      // <- new code that happens to change the value of a, and will affect correctness
  return a;
}

compiler will not complain about it as you do not have a const qualifier here, so an extra const qualifier is not necessary, but it could be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added const qualifier following your suggestion.

marcoabreu and others added 16 commits June 22, 2018 01:31
* Initial commit

* Add coverage generation to all python tests

* Restrict package and add branch coverage

* Add sanity for test

* Revert "Add sanity for test"

This reverts commit 5d86bd7.

* Delete reference to unexistant file

* Add bot configuration

* Enable coverage for GCC

* Revert "Enable coverage for GCC"

This reverts commit ae5ecff.
…pache#11333)

* fix param init bug and remove memcpy/memset

* fix bug for bidirection size

* add 100 times loop for test_gru_bidirectional robust checking

* add test_loop_gru_bidirectional

* record number of passed

* remove 1000 times case
…bute 'value'' on distributed processing applications (apache#11332)

* add scope to NameManager

* add AttrScope scope

* adding test

* update NameManager

* Trigger build

* Trigger build

* Add attribute checks for register module
…apache#11381)

* flaky test disable test_ImageRecordIter_seed_augmentation temporarily

* test deconv relax
…he#11355)

* Support dense weight and sparse grad AdagradUpdate

* Simplify AdagradStorageType

* Add test
* Resolve conflicts

* Export module Test Framework

* refactoring export to work with pretrained models

* comments added

* 1. Refactored export module.
2. Refactored test framework to support ONNX backened tests.
2. Added Operator support:
   - Convolution2D
   - BatchNorm
   - Add

* Added Arithmetic operators:
- Add, Sub, Mul, Div, Sum

* Added operator support:
- sigmoid, relu, pad( constant, edge, reflect), tanh
- enabled corresponding ONNX backend tests.

* Enabled ONNX tests: test_conv, test_basic_conv

Added Operators :
Ceil, Floor

* Added support for:
MaxPool, AvgPool, GlobalMaxPool, GlobalAvgPool, matmul

* adding more operators

* Added Operator support:
ArgMax, ArgMin, maximum, minimum

* Enabled more BASIC_MODEL tests

* Added power operator tests

* Added support for reshape. ONNX only supports 0, -1  special values. Added only for these.
Fixed logic error with convert_string_to_list()

* some tests enabled

* enabling squeezenet

* LRN Op support

* mul_scalar modified to take scalar input

* cleaning some code

* Resolving conlicts on rebase

* Resolving rebase conflicts

* id mapping updated for all operators

* save onnx models added, some code cleanup

* enabled more tests

* conv pad calc fixed

* reshape op fix

* Added support for elu, leakyRelu, prelu

* Cleanup
- Removed run_node, not needed anymore.
- Used correct get_metadata api

* valueinfoproto fix, googlenet test added

* Removed redundant code.
- run_node
- Using correct get_metadata_api

* dilation added

* Lint fixes

* lint fixes

* some fixes to make export work with onx1.2.1

* enabled more tests

* mxnet_export_test file added

* duplicate file deleted

* reduce ops added

* some small fixes

* some lint fixes

* Add tests for inception_v1 and inception_v2

* Add CI runs for export module

* docstring added

* lint fixes, pooling attr fix

* fix

* fix global_pool

* CI  run fix

* code cleanup

* lint fix

* some code cleanup

* pad in pooling added

* slicechannel notimplementederror raised

* Added required license comments

* Lint fixes

* lint fix

* lint fix

* lint fix

* lint fix

* Correct license statement

* Adding onnx a runtime dependency

* Fix import module error for string_types

* Making ONNX runtime dependency

* fixing some comments

* addressing some comments

* params rename

* lint fixes

* fixes

* spatial disabled, path fixed

* fixing some comments

* Added support for remaining act_type(softsign, sigmoid, softrelu) in Activation operator

* changing import

* adding some comments

* Add squeeze op

* Refactored logic to handle extra node(output label node) for saved mxnet model
Added comments

* minor fix for squeeze operator.
Also, added error handling

* identity operator added

* scalar ops added

* Renamed onnx support folders to mark it public folders
Changed underline files public or private as per usage

Resolved conflicts with the latest

* Added support L2Normalization op
Added some error checking

* added comments and warning

* added comments and warning

* doc API ref added
* Add test result publishing to windows

* Fix names of files

* Fix syntax of xcopy on Windows
…) (apache#11391)

* Dont fail during artifact storage

* Update Jenkinsfile

* Update Jenkinsfile
* Update test_gluon_trainer.py

* Update test_gluon_trainer.py

* Update test_gluon_trainer.py

* Update test_gluon_trainer.py

* Update test_gluon_trainer.py

* trigger

* Run 100000 times

* Update test_gluon_trainer.py

* run 10K times

* test_trainer_reset_kv didn't fail for 10K time . 2nd Trigger.

* test_trainer_reset_kv didn't fail for 10K times. 3rd Trigger.

* remove for loop
* fix recordfile dataset with multi worker

* fix another test

* fix
…pache#11259)

Use float64 computations as the reference numpy implementation operates in double and not float.
f64(f32(f64(.))) % f64(f32(f64(.))) is not the same as f64(.) % f64(.) due to limited precission.

fixes apache#9853
* implementation of histogram operator

* address code reviews and code re-design

* add exception for invalid inputs

* address code reviews

* add symbol and symbolic forward check for histogram
* Added two tutorials on learning rate schedules; basic and advanced.

* Correcting notebook skip line.

* Corrected cosine graph

* Changes based on @KellenSunderland feedback.
* refactor copyfrom

* add boilerplate

* rename to MKLDNNCopy

* write to temp memory

* reorder mkldnn / views

* return memory from GetMKLDNNData

* add kaddto to unit test

* move orig output before creatingnewmem

* coerce memory if shape does not fit

* use MKLDNNCopy in commit

* uncomment addto test

* switch order of mkldnnsum params

* improving logging

* wait to read after copying arr

* remove extra white spaces

* remove extra white space

* remove unused var

* reorder output

* do not write to views

* remove shape check in test

* use input pdesc

* remove unused var

* fix merge

* put inplace in separate loop

* use two mem

* use sum_pd when calling CreateMKLDNNData

* reorder sum shapes if needed

* comment out getsumpd

* use MKLDNNCopy helper to reshape mem

* remove getsumpd

* use output mem for createmem

* remove todo

* waittoread output

* do not attempt to shape output

* use correct arr as input

* revert commit change to ps-lite

* revert change to tvm

* fix lint

* add comment to test

* reduce calls to get_primitive_desc

* skip tests that reorder2default

* push_back to inputs

* skip if view/mkldnn

* add noop test

* pass input ptr for write in place

* allow empty
anbrjohn and others added 16 commits June 26, 2018 11:46
* Fix bi-lstm-crf to update crf weights

* Use self.params.get to declare params
…e#10391)

* dtype for data, working fp16

* test dtype fp16 gluon

* add gluon fine tuning code

* data iter caltech

* caltech iter

* working finetuning for fp16, but is it using  pretrained params

* benchmark fp16

* add wip tutorials

* working notebook fp16

* changes to symbolic examples

* changes to symbolic examples

* add fp16 notebook

* remove extra files

* remove output of notebook

* update md file

* remove from faq

* dtype for data, working fp16

* test dtype fp16 gluon

* add gluon fine tuning code

* data iter caltech

* caltech iter

* working finetuning for fp16, but is it using  pretrained params

* benchmark fp16

* add wip tutorials

* working notebook fp16

* changes to symbolic examples

* changes to symbolic examples

* add fp16 notebook

* remove extra files

* remove output of notebook

* update md file

* remove from faq

* WIP address feedback

* gluon example

* add top5 back

* clean up gluon example

* address feedback

* address comments

* move tutorial to faq

* Add training curves

* formatting

* update image

* trigger ci
…1340)

* Added scripts for broken link checker job

* Corrected the order in which scripts are copied to mxnet-site repo

* Added the echo statements and trying to install the utilities in docker using sudo

* Trying npm installation without sudo

* Creating docker file npm installation

* Creating ubuntu_blc docker image in JenkinsfileForBLC

* Copying the url_list.txt from s3 bucket

* Trying to copy url_list.txt from s3 bucket

* Trying to copy file from aws outside container

* Updated the scripts with the right relative path to download the url_list.txt

* Removed the exit 1 status that can cause to stop the job before saving the file to S3

* Added the README.md file

* Addressed the review comments

* added the right parameters to docker_run

* Removed the reference to ubuntu_ccache.sh file which no longer exists

* Added the license header to the files

* Trying the logic to mark the build as failed when regression is detected

* Corrected the path in aws cp command at the end of the job.

* Adding the finally block to stage

* Addressed the review comments
…NN=1' issue (apache#11090)

* Define build target for mkldnn lib build to fix 'make clean USE_MKMLDNN=1' issue

* fix create install dir and other minor issues

* Fix GPU MKLDNN and cpp-package build failure

* Fix issue to only link with full MKL when BLAS is mkl

* simplify logic by removing MKLDNN_ROOT support and some renaming

* retrigger Jenkins

* retrigger Jenkins

* retrigger Jenkins

* retrigger Jenkins

* retrigger Jenkins
* fix url

* fix url

* fix url

* Update compare_layers.py

* Update compare_layers.py

* Update test_converter.py

* Update compare_layers.py

* Update compare_layers.py

* Update compare_layers.py
@apeforest
Copy link
Contributor Author

@eric-haibin-lin unit test added this time :)

@haojin2
Copy link
Contributor

haojin2 commented Jun 28, 2018

@apeforest please do a rebase properly, you're including a lot of committed changes.
Please refer to https://cwiki.apache.org/confluence/display/MXNET/MXNet+Development+Guide for guide on how to rebase properly

@apeforest
Copy link
Contributor Author

The commits have been messed up because my earlier PR was checked out from a different branch. I guess I will create a new PR with the changes and send out for review. Sorry for the inconvenience.

@eric-haibin-lin
Copy link
Member

Moved to #11466

@apeforest apeforest deleted the feature/enhance_operator branch July 2, 2018 18:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.