Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix batchnorm problem with sparse matrices when fix_gamma=True #11656

Merged
merged 1 commit into from
Jul 13, 2018

Conversation

haojin2
Copy link
Contributor

@haojin2 haojin2 commented Jul 11, 2018

Description

As title.

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Fix fallback logic for batchnorm operator
  • Block usage of fix_gamma=True when input is sparse
  • Unit tests

Comments

Resurrection of #11631.

@haojin2
Copy link
Contributor Author

haojin2 commented Jul 11, 2018

@marcoabreu Seems like there's no build triggered for this: http:https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/activity/?branch=PR-11656. Can you take a look?

@anirudh2290
Copy link
Member

Link to the issue: #11655

@haojin2
Copy link
Contributor Author

haojin2 commented Jul 11, 2018

@anirudh2290 That's a duplicate issue of #11654.

@haojin2 haojin2 closed this Jul 12, 2018
@haojin2 haojin2 reopened this Jul 12, 2018
Copy link
Contributor

@KellenSunderland KellenSunderland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -89,6 +89,9 @@ the output. It is often used during inference.
Both ``gamma`` and ``beta`` are learnable parameters. But if ``fix_gamma`` is true,
then set ``gamma`` to 1 and its gradient to 0.

There's no sparse support for this operator, and will exhibit problematic behavior if used with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "There's no sparse support for this operator, and will" -> "There's no sparse support for this operator. It will"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -233,7 +233,7 @@ def check_batchnorm_training(stype):
mx.nd.array(beta).tostype(stype)]
mean_std = [mx.nd.array(rolling_mean).tostype(stype), mx.nd.array(rolling_std).tostype(stype)]

test = mx.symbol.BatchNorm(data, fix_gamma=True)
test = mx.symbol.BatchNorm(data, fix_gamma=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate why you are modifying a test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had the conversation in Lai's PR yesterday, and I've created the related issue #11647

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this mkl test is to test that we can fallback to sparse matrices correctly when using MKLDNN under the legal cases for sparse matrices. The logics of exclusion of illegal cases for sparse matrices(fix_gamma=True), which is shared by both USE_MKLDNN=1 and USE_MKLDNN=0 situtaions, were already tested by the test_batchnorm_fallback test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, sorry, got a lot of PRs swirling around my head. Let me rephrase to ensure I got it right: This means that the test case with gamma=True is basically invalid until that issue has been resolved and that's why you are changing it, correct?
If yes, we're good to go. Sorry for the caused inconvenience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you got it, that's why we're only testing the necessary legal case here. Please merge this once you feel good to do so, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to request the review from another committer to assess the backend changes if you don't mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I think @anirudh2290 already approved the previous #11631

@haojin2 haojin2 force-pushed the fix_batchnorm_sparse branch 2 times, most recently from 97dec9a to 7613011 Compare July 12, 2018 21:12
Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good to me.

return MKLDNNStorageType(attrs, dev_mask, true, dispatch_mode,
in_attrs, out_attrs);
}
if ((common::ContainsStorageType(*in_attrs, kRowSparseStorage) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change it to !common::ContainsOnlyStorage(kDefault), in case we add more stype in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@marcoabreu marcoabreu merged commit 5b4d528 into apache:master Jul 13, 2018
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
@haojin2 haojin2 added the Sparse label Aug 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants