Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Improve FC perf when no_bias=False #15033

Merged
merged 4 commits into from
May 27, 2019
Merged

Conversation

anirudh2290
Copy link
Member

@anirudh2290 anirudh2290 commented May 22, 2019

Description

Improve FullyConnected op performance when no_bias is set to False. Fixes: #14938

Perf Results:

Before: 

GPU version: 
Flatten = T, Bias = T, Parameters count = 50000560 . Forward time: 0.4266066551208496, Backward time: 0.2575654983520508, Total time: 0.6841721534729004 
Flatten = T, Bias = F, Parameters count = 50000510 . Forward time: 0.42148637771606445, Backward time: 0.2561371326446533, Total time: 0.6776235103607178 
Flatten = F, Bias = T, Parameters count = 5005060 . Forward time: 0.39539504051208496, Backward time: 1.700193166732788, Total time: 2.095588207244873 
Flatten = F, Bias = F, Parameters count = 5005010 . Forward time: 0.3380405902862549, Backward time: 0.44127631187438965, Total time: 0.7793169021606445 

CPU version: 
Flatten = T, Bias = T, Parameters count = 50000560 . Forward time: 3.4699082374572754, Backward time: 7.634012699127197, Total time: 11.103920936584473 
Flatten = T, Bias = F, Parameters count = 50000510 . Forward time: 3.4408316612243652, Backward time: 6.942991733551025, Total time: 10.38382339477539 
Flatten = F, Bias = T, Parameters count = 5005060 . Forward time: 6.389216899871826, Backward time: 11.683183908462524, Total time: 18.07240080833435 
Flatten = F, Bias = F, Parameters count = 5005010 . Forward time: 5.604649543762207, Backward time: 7.6611716747283936, Total time: 13.2658212184906 


After: 

GPU version: 
Flatten = T, Bias = T, Parameters count = 50000560 . Forward time: 0.42462778091430664, Backward time: 0.25749874114990234, Total time: 0.682126522064209 
Flatten = T, Bias = F, Parameters count = 50000510 . Forward time: 0.42075681686401367, Backward time: 0.2557032108306885, Total time: 0.6764600276947021 
Flatten = F, Bias = T, Parameters count = 5005060 . Forward time: 0.3945605754852295, Backward time: 0.4475700855255127, Total time: 0.8421306610107422 
Flatten = F, Bias = F, Parameters count = 5005010 . Forward time: 0.3443477153778076, Backward time: 0.44979381561279297, Total time: 0.7941415309906006 

CPU version: 
Flatten = T, Bias = T, Parameters count = 50000560 . Forward time: 3.4988842010498047, Backward time: 7.586740255355835, Total time: 11.08562445640564 
Flatten = T, Bias = F, Parameters count = 50000510 . Forward time: 3.421649694442749, Backward time: 6.978037118911743, Total time: 10.399686813354492 
Flatten = F, Bias = T, Parameters count = 5005060 . Forward time: 6.294604539871216, Backward time: 7.7125022411346436, Total time: 14.00710678100586 
Flatten = F, Bias = F, Parameters count = 5005010 . Forward time: 5.617329120635986, Backward time: 7.549039125442505, Total time: 13.166368246078491 

@NRauschmayr @Ishitori

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@Ishitori
Copy link
Contributor

Seems like the build failed, but not because of your changes @anirudh2290. Can you force the build once again, please?

@anirudh2290
Copy link
Member Author

@Ishitori I am digging more into it, since I am suspecting it is related to this PR.

@karan6181
Copy link
Contributor

@mxnet-label-bot add [Operator, C++, pr-awaiting-review]

@marcoabreu marcoabreu added C++ Related to C++ Operator pr-awaiting-review PR is waiting for code review labels May 23, 2019
@anirudh2290
Copy link
Member Author

cc @Vikas89 @mseth10 @apeforest for review

@@ -316,11 +316,9 @@ NNVM_REGISTER_OP(_backward_FullyConnected)
const FullyConnectedParam& params = nnvm::get<FullyConnectedParam>(attrs.parsed);
return params.no_bias ? 2 : 3;
})
#if MXNET_USE_MKLDNN == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this condition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we need additional temp space now.

Copy link
Contributor

@mseth10 mseth10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pengzhao-intel
Copy link
Contributor

pengzhao-intel commented May 24, 2019

@ciyongch could you help take a look?

Copy link
Contributor

@ciyongch ciyongch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anirudh2290
Copy link
Member Author

@ciyongch @pengzhao-intel on a side note, https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/fully_connected.cc#L222 . The backward still doesnt use MKLDNN. Do you know why and is there a plan to fix this ?

@ciyongch
Copy link
Contributor

@anirudh2290 Currently, we're focusing on inference pass, and facing the big code refactor due to API change with the coming MKL-DNN v1.0. We'll take a look and see whether to fix it in the new MKL-DNN v1.0 or in the current version.

@anirudh2290 anirudh2290 merged commit 6cf964a into apache:master May 27, 2019
@anirudh2290 anirudh2290 deleted the FC_fix branch May 28, 2019 00:12
roywei added a commit to roywei/incubator-mxnet that referenced this pull request May 29, 2019
anirudh2290 pushed a commit that referenced this pull request May 30, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* Improve FC perf when no_bias=False

* Add Issue number in comment

* Correct req
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C++ Related to C++ Operator pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance issue when use_bias=True
8 participants