LSTM and GRU layers without DNNL enabled give wrong gradients #17898

zixuanweeei · 2020-03-24T06:58:30Z

Description

Currently, we have two implementations of RNN layers on the CPU backend, which are

Native fusion implementation,
The fusion enabled by DNNL library (https://intel.github.io/mkl-dnn/dev_guide_rnn.html).

Both of them can be invoked from mx.sym.RNN, mx.rnn.FusedRNNCell, mx.gluon.rnn.LSTM/GRU/RNN. The fusion of DNNL provides more efficient Forward and Backward, while the native one gives a backup for some devices or environments that cannot use DNNL library.

Recently, we have found that there are some problems leading to the wrong gradients' calculation of the native implementation. Just tracking the issue here, and it will be fixed ASAP.

The text was updated successfully, but these errors were encountered:

pengzhao-intel · 2020-04-07T13:47:42Z

@bgawrych please help take a look if you can reproduce this issue and if there is a case to catch this issue, thanks.

…8316) * [Large Tensor] Backport of Fixed RNN op (#17632) * Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments * [v1.7.x] Backport of Fix LSTM and GRU layers gradient calculations (#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers Co-authored-by: Connor Goggins <[email protected]>

* [v1.x] [Large Tensor] Backport of Fixed RNN op (#17632) * Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments * [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers Co-authored-by: Connor Goggins <[email protected]>

…17632) (apache#18317) * [v1.x] [Large Tensor] Backport of Fixed RNN op (apache#17632) * Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments * [v1.x] Backport of Fix LSTM and GRU layers gradient calculations (apache#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers Co-authored-by: Connor Goggins <[email protected]>

zixuanweeei added the Bug label Mar 24, 2020

bgawrych mentioned this issue Apr 30, 2020

Fix LSTM and GRU layers gradient calculations #18203

Merged

2 tasks

This was referenced May 14, 2020

[1.7.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) #18316

Merged

[1.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) #18317

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM and GRU layers without DNNL enabled give wrong gradients #17898

LSTM and GRU layers without DNNL enabled give wrong gradients #17898

zixuanweeei commented Mar 24, 2020

pengzhao-intel commented Apr 7, 2020

LSTM and GRU layers without DNNL enabled give wrong gradients #17898

LSTM and GRU layers without DNNL enabled give wrong gradients #17898

Comments

zixuanweeei commented Mar 24, 2020

Description

pengzhao-intel commented Apr 7, 2020