Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

LSTM and GRU layers without DNNL enabled give wrong gradients #17898

Open
zixuanweeei opened this issue Mar 24, 2020 · 1 comment
Open

LSTM and GRU layers without DNNL enabled give wrong gradients #17898

zixuanweeei opened this issue Mar 24, 2020 · 1 comment
Labels

Comments

@zixuanweeei
Copy link
Contributor

Description

Currently, we have two implementations of RNN layers on the CPU backend, which are

Both of them can be invoked from mx.sym.RNN, mx.rnn.FusedRNNCell, mx.gluon.rnn.LSTM/GRU/RNN. The fusion of DNNL provides more efficient Forward and Backward, while the native one gives a backup for some devices or environments that cannot use DNNL library.

Recently, we have found that there are some problems leading to the wrong gradients' calculation of the native implementation. Just tracking the issue here, and it will be fixed ASAP.

@pengzhao-intel
Copy link
Contributor

@bgawrych please help take a look if you can reproduce this issue and if there is a case to catch this issue, thanks.

pengzhao-intel pushed a commit that referenced this issue Jun 3, 2020
…8316)

* [Large Tensor] Backport of Fixed RNN op (#17632)

* Changed relevant function args to index_t

* Added nightly test for RNN

* Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh

* Using const instead of literals

* Added nightly test for RNN ReLU & tanh, LSTM, GRU

* Type assertion to force evaluation of output NDArray

* Incorporated latest round of comments

* [v1.7.x] Backport of Fix LSTM and GRU layers gradient calculations (#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers

Co-authored-by: Connor Goggins <[email protected]>
pengzhao-intel pushed a commit that referenced this issue Jun 3, 2020
* [v1.x] [Large Tensor] Backport of Fixed RNN op (#17632)

* Changed relevant function args to index_t

* Added nightly test for RNN

* Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh

* Using const instead of literals

* Added nightly test for RNN ReLU & tanh, LSTM, GRU

* Type assertion to force evaluation of output NDArray

* Incorporated latest round of comments

* [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers

Co-authored-by: Connor Goggins <[email protected]>
ChaiBapchya pushed a commit to ChaiBapchya/mxnet that referenced this issue Aug 15, 2020
…17632) (apache#18317)

* [v1.x] [Large Tensor] Backport of Fixed RNN op (apache#17632)

* Changed relevant function args to index_t

* Added nightly test for RNN

* Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh

* Using const instead of literals

* Added nightly test for RNN ReLU & tanh, LSTM, GRU

* Type assertion to force evaluation of output NDArray

* Incorporated latest round of comments

* [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (apache#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers

Co-authored-by: Connor Goggins <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants