Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Mkldnn fullyConnect bwd bug fix #16890

Merged
merged 2 commits into from
Nov 25, 2019
Merged

Mkldnn fullyConnect bwd bug fix #16890

merged 2 commits into from
Nov 25, 2019

Conversation

rongzha1
Copy link
Contributor

Description

fix mkldnn fc bwd bug due to data inplace
when data inplace, if do bwd_data will change src data which will lead bwd_weight calc error.

@PatricZhao @TaoLv @xinyu-intel

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • [done ] Changes are complete (i.e. I finished coding on this PR)
  • [done ] All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • [done ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@xinyu-intel
Copy link
Contributor

enable mkldnn inner product backward or this may not take effect.

@TaoLv
Copy link
Member

TaoLv commented Nov 22, 2019

To be more accurate, MKL-DNN FC backward path was not enabled before, so there is no bug of it in previously MXNet versions. This PR tries to enable it.

@TaoLv
Copy link
Member

TaoLv commented Nov 22, 2019

FullyConnectedGradComputeExCPU and BackwardFCStorageType should be changed accordingly.

@ptrendx
Copy link
Member

ptrendx commented Nov 22, 2019

@TaoLv @PatricZhao Does this impact 1.6?

@TaoLv
Copy link
Member

TaoLv commented Nov 23, 2019

@ptrendx No, it should not affect 1.6.0 release.

@pengzhao-intel pengzhao-intel added this to In progress in CPU Performance and Quantization via automation Nov 23, 2019
CPU Performance and Quantization automation moved this from In progress to Reviewer approved Nov 25, 2019
Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pengzhao-intel pengzhao-intel merged commit 436967b into apache:master Nov 25, 2019
CPU Performance and Quantization automation moved this from Reviewer approved to Done Nov 25, 2019
TaoLv added a commit to TaoLv/incubator-mxnet that referenced this pull request Nov 26, 2019
@TaoLv
Copy link
Member

TaoLv commented Nov 26, 2019

I'm going to revert this PR as it causes flaky issue reported here: #16895 (comment). @rongzha1 Could you please re-submit once the issue is fixed? Thanks.

wkcn pushed a commit that referenced this pull request Nov 26, 2019
* Revert "Mkldnn fullyConnect bwd bug fix (#16890)"

This reverts commit 436967b.

* ci
@TaoLv
Copy link
Member

TaoLv commented Jan 15, 2020

@rongzha1 Any update for this PR and the flaky CPP test?

@rongzha1
Copy link
Contributor Author

Can not reproduced in local machine, still not find the root cause.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants