[MXNET-107] Add Fused Vanilla RNN and dropout for CPU #11399

lihaofd · 2018-06-26T00:41:47Z

Description

In this PR, it creates Fused Vanilla RNN(tanh/relu) operator and dropout of GRU/LSTM/vRNN for CPU.
@pengzhao-intel, @TaoLv

Feature changes

New features

Single layer/Multiple layer and unidirectional/bidirectional Vanilla RNN(tanh/relu), including both forward and backward computation.
Support dropout of GRU/LSTM/vRNN

Unit-test changes

Create new testcase in tests/python/unittests/test_operator.py.
update testcase in example/rnn/bucketing/cudnn_rnn_bucketing.py
Check consistency with original RNNCell implementation.

Performance

We have tested performance of FusedRNN and NonFused RNNCell on our local Skylake-8180 with 2 Sockets and 56 cores. Use MKL as blas lib in this performance test.
Test input size is from DS2 default parameters(seq_length = 300, batch_size = 20, input_size = 800, hidden_size = 800).

Layer=1 bidirectional = False

API	Inference time(fwd, samples/sec)	Training time(fwd + bwd, samples/sec)
rnn.RNNCell - NoFusedRNN(Tanh, CPU)	492.61	198.02
this PR - FusedRNN(Tanh, CPU)	952.38	318.98
speedup	1.93x	1.61x

API	Inference time(fwd, samples/sec)	Training time(fwd + bwd, samples/sec)
rnn.RNNCell - NoFusedRNN(Relu, CPU)	277.78	104.17
this PR - FusedRNN(Relu, CPU)	740.74	177
speedup	2.67x	1.7x

Layer=5 bidirectional = True

API	Inference time(fwd, samples/sec)	Training time(fwd + bwd, samples/sec)
rnn.RNNCell - NoFusedRNN(Tanh, CPU)	38.91	22.73
rnn.RNNCell (Tanh, cuda)	47.85	26.95
rnn.RNNCell (Tanh, cudnn)	208.33	81.63
this PR - FusedRNN(Tanh, CPU)	104.17	34.01
speedup -this PR/RNNCell (Tanh, CPU)	267.7%	149.7%
speedup -this PR/RNNCell (Tanh, cuda)	217.7%	126.2%
speedup -this PR/RNNCell (Tanh, cudnn)	50%	41.7%

API	Inference time(fwd, samples/sec)	Training time(fwd + bwd, samples/sec)
rnn.RNNCell - NoFusedRNN(Relu, CPU)	40.73	22.6
rnn.RNNCell (Relu, cuda)	52.91	26.81
rnn.RNNCell (Relu, cudnn)	206.83	82.64
this PR - FusedRNN(Relu, CPU)	134.23	35.97
speedup -this PR/RNNCell (Relu, CPU)	329.5%	159.2%
speedup -this PR/RNNCell (Relu, cuda)	253.7%	134.2%
speedup -this PR/RNNCell (Relu, cudnn)	64.9%	43.5%

Convergency Curve

We have tested Convergency of FusedGRU/LSTM(dropout = 0.5) on our CPU-Skylake-8180 with 2 Sockets and 56 cores and GPU-P100 by using example/rnn/bucketing/cudnn_rnn_bucketing.py
Test input size is layer = 3, batch_size = 32, num-embed = 800, num-hidden = 800, num-epochs 20

@szha: resolves #10870, #10872

TaoLv · 2018-06-26T01:20:11Z

Please remove [WIP] from the title and add the JIRA number to it. https://issues.apache.org/jira/browse/MXNET-107

add vRNN and dropout

9f6a4bf

lihaofd requested a review from szha as a code owner June 26, 2018 00:41

szha self-assigned this Jun 26, 2018

szha requested review from reminisce, sxjscience and eric-haibin-lin June 26, 2018 00:44

lihaofd changed the title ~~[WIP] Add Fused Vanilla RNN and dropout~~ [MXNET-107] Add Fused Vanilla RNN and dropout Jun 26, 2018

lihaofd changed the title ~~[MXNET-107] Add Fused Vanilla RNN and dropout~~ [MXNET-107] Add Fused Vanilla RNN and dropout for CPU Jun 26, 2018

piiswrong merged commit 0538ad9 into apache:master Jun 26, 2018

szha added this to To do in gluon.rnn improvements via automation Jun 26, 2018

szha moved this from To do to Done in gluon.rnn improvements Jun 26, 2018

szha mentioned this pull request Jul 18, 2018

enable CPU kernel for all RNN layer forward #11807

Merged

4 tasks

pengzhao-intel mentioned this pull request Aug 27, 2018

update release news #12342

Merged

XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018

add vRNN and dropout (apache#11399)

0a0dce2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-107] Add Fused Vanilla RNN and dropout for CPU #11399

[MXNET-107] Add Fused Vanilla RNN and dropout for CPU #11399

lihaofd commented Jun 26, 2018 •

edited by szha

Loading

TaoLv commented Jun 26, 2018

[MXNET-107] Add Fused Vanilla RNN and dropout for CPU #11399

[MXNET-107] Add Fused Vanilla RNN and dropout for CPU #11399

Conversation

lihaofd commented Jun 26, 2018 • edited by szha Loading

Description

Feature changes

New features

Unit-test changes

Performance

Convergency Curve

TaoLv commented Jun 26, 2018

lihaofd commented Jun 26, 2018 •

edited by szha

Loading