Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

make gluon rnn layers hybrid blocks #11482

Merged
merged 6 commits into from
Aug 4, 2018
Merged

Conversation

szha
Copy link
Member

@szha szha commented Jun 29, 2018

Description

make gluon rnn layers hybrid blocks
resolves #10873

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes

  • make gluon rnn layers hybrid blocks
  • add reverse infer shape for reshape when output shape is complete and only one input dimension is missing.

@szha szha requested a review from piiswrong June 29, 2018 06:34
@szha szha added this to To do in gluon.rnn improvements via automation Jun 29, 2018
@szha szha moved this from To do to In progress in gluon.rnn improvements Jun 29, 2018
def hybrid_forward(self, F, inputs, states=None, **kwargs):
if F is ndarray:
batch_size = inputs.shape[self._layout.find('N')]
if self._input_size == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implement infershape instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not possible due to the inverse shape inference in concat.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or did you mean overriding block's infer shape? I'm taking a look

states = self.begin_state(batch_size, ctx=inputs.context)
else:
states = self.begin_state(0, func=symbol.zeros)
if isinstance(states, (ndarray.NDArray, symbol.Symbol)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ndarray.NDArray, symbol.Symbol) -> tensor_types

return outputs, new_states

def _forward_kernel(self, inputs, states):
def __call__(self, inputs, *states):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implement infer_shape instead?

@szha
Copy link
Member Author

szha commented Jun 29, 2018

@piiswrong turns out I cannot do that only in infer_shape. The reason is that sometimes the block is used as a child block of other blocks, in which case the infer shape is called from parent, thus bypassing the code path in rnn infer shape.

@piiswrong
Copy link
Contributor

I see. Then could you do it without overriding __call__?

@szha
Copy link
Member Author

szha commented Jun 30, 2018

I can override forward but it would be pretty much equivalent. The reason I cannot do this in hybrid_forward is that when given partial shape, hybrid_forward would only be invoked with symbols as part of the infer_shape pass.

@szha szha changed the title make gluon rnn layers hybrid blocks [WIP] make gluon rnn layers hybrid blocks Jul 2, 2018
@szha szha requested a review from anirudh2290 as a code owner July 9, 2018 21:25
@szha szha force-pushed the gluon_hybrid_rnn branch 2 times, most recently from de75d33 to 0970b45 Compare July 19, 2018 17:05
@szha szha changed the title [WIP] make gluon rnn layers hybrid blocks make gluon rnn layers hybrid blocks Jul 26, 2018
@szha szha force-pushed the gluon_hybrid_rnn branch 4 times, most recently from 03ea58d to a7a3112 Compare July 27, 2018 17:22
return (*out_attrs)[0].ndim();
if ((*out_attrs)[0].ndim()) {
return ReverseReshapeInferShape(&(*in_attrs)[0], oshape);
}
Copy link
Contributor

@haojin2 haojin2 Jul 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be

    if ((*out_attrs)[0].ndim()) {
      return ReverseReshapeInferShape(&(*in_attrs)[0], oshape);
    }
    return false;

instead?

@szha szha force-pushed the gluon_hybrid_rnn branch 2 times, most recently from 8a62ec4 to 340dc9d Compare August 1, 2018 02:32
@Roshrini
Copy link
Member

Roshrini commented Aug 2, 2018

@szha today is the mentioned code freeze date for MXNet 1.3 release. Could you please check the status of this PR? thanks!

@haojin2
Copy link
Contributor

haojin2 commented Aug 2, 2018

@Roshrini We've found the root cause of the bug, the temporary work-around is the one being built and tested now.

@szha
Copy link
Member Author

szha commented Aug 2, 2018

@Roshrini this PR uncovers a weird bug which should probably be addressed before releasing.

@szha szha force-pushed the gluon_hybrid_rnn branch 2 times, most recently from 6a80987 to 9dde5e7 Compare August 3, 2018 00:20
@@ -502,7 +498,6 @@ def test_cell_fill_shape():
@assert_raises_cudnn_disabled()
def test_layer_fill_shape():
layer = gluon.rnn.LSTM(10)
layer.hybridize()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this hybridize call should be enough. is there any reason we are removing all hybridize calls ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same error on test_norm still occurs if I only remove this one call, so I was trying out all possible cases

@Roshrini
Copy link
Member

Roshrini commented Aug 3, 2018

@szha thanks for the update.

@@ -320,5 +375,41 @@ NNVM_REGISTER_OP(_backward_Concat)
#endif
.set_attr<FCompute>("FCompute<cpu>", ConcatGradCompute<cpu>);


NNVM_REGISTER_OP(_rnn_param_concat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write comments for this operator. How is it different from the normal concat.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a custom concat op with specialized infer_shape, which handles the case where the first one or two inputs may have unknown shape that can be inferred from output shape.

@@ -25,7 +25,6 @@
from common import assert_raises_cudnn_disabled


@assert_raises_cudnn_disabled()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you remove this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s not calling cudnn kernel

@@ -490,7 +505,7 @@ def test_layer_fill_shape():
layer.hybridize()
check_rnn_layer_forward(layer, mx.nd.ones((3, 2, 7)))
print(layer)
assert layer.i2h_weight[0].shape[1] == 7, layer.i2h_weight[0].shape[1]
assert layer.l0_i2h_weight.shape[1] == 7, layer.l0_i2h_weight.shape[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this API change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been exposed as a documented attribute before and shouldn’t be considered part of API

@szha szha merged commit 5474b08 into apache:master Aug 4, 2018
gluon.rnn improvements automation moved this from In progress to Done Aug 4, 2018
@szha szha deleted the gluon_hybrid_rnn branch August 6, 2018 19:01
aaronmarkham pushed a commit to aaronmarkham/incubator-mxnet that referenced this pull request Aug 6, 2018
* make Gluon RNN layer hybrid block

* separate gluon gpu tests

* remove excess assert_raises_cudnn_disabled usage

* add comments and refactor

* add bidirectional test

* temporarily remove hybridize in test_gluon_rnn.test_layer_fill_shape
aaronmarkham added a commit to aaronmarkham/incubator-mxnet that referenced this pull request Aug 7, 2018
[MXNET-750] fix nested call on CachedOp. (apache#11951)

* fix nested call on cachedop.

* fix.

extend reshape op to allow reverse shape inference (apache#11956)

Improve sparse embedding index out of bound error message; (apache#11940)

[MXNET-770] Remove fixed seed in flaky test (apache#11958)

* Remove fixed seed in flaky test

* Remove fixed seed in flaky test

Update ONNX docs with the latest supported ONNX version (apache#11936)

Reduced test to 3 epochs and made gpu only (apache#11863)

* Reduced test to 3 epochs and made GPU only

* Moved logger variable so that it's accessible

Fix flaky tests for test_laop_4 (apache#11972)

Updating R client docs (apache#11954)

* Updating R client docs

* Forcing build

Fix install instructions for MXNET-R (apache#11976)

* fix install instructions for MXNET-R

* fix install instructions for MXNET-R

* fix default cuda version for MXNet-R

[MXNET-751] fix ce_loss flaky (apache#11971)

* add xavier initializer

* remove comment line

[MXNET-769] set MXNET_HOME as base for downloaded models through base.data_dir() (apache#11636)

* set MXNET_DATA_DIR as base for downloaded models through base.data_dir()
push joblib to save containers so is not required when running

* MXNET_DATA_DIR -> MXNET_HOME

[MXNET-748] linker fixed on Scala issues (apache#11989)

* put force load back as a temporary solution

* use project.basedir as relative path for OSX linker

[MXNET-772] Re-enable test_module.py:test_module_set_params (apache#11979)

[MXNET-771] Fix Flaky Test test_executor.py:test_dot (apache#11978)

* use assert_almost_equal, increase rtol, reduce matrix size

* remove seed in test_bind

* add seed 0 to test_bind, it is still flaky

* add comments for tracking

remove mod from arity 2 version of load-checkpoint in clojure-package (apache#11808)

* remove mod from arity 2 version of load-checkpoint

* load-checkpoint arity 2 test

Add unit test stage for mxnet cpu in debug mode (apache#11974)

Website broken link fixes (apache#12014)

* fix broken link

* fix broken link

* switch to .md links

* fix broken link

removed seed from flaky test (apache#11975)

Disable ccache log print due to threadunsafety (apache#11997)

Added default tolerance levels for regression checks for MBCC (apache#12006)

* Added tolerance level for assert_almost_equal for MBCC

* Nudge to CI

Disable flaky mkldnn test_requantize_int32_to_int8 (apache#11748)

[MXNET-769] Usability improvements to windows builds (apache#11947)

* Windows scripted build
Adjust Jenkins builds to use ci/build_windows.py

Issues:

    apache#8714
    apache#11100
    apache#10166
    apache#10049

* Fix bug

* Fix non-portable ut

* add xunit

Fix import statement (apache#12005)

array and multiply are undefined. Importing them from
ndarray

Disable flaky test test_random.test_gamma_generator (apache#12022)

[MXNET-770] Fix flaky test: test_factorization_machine_module (apache#12023)

* Remove fixed seed in flaky test

* Remove fixed seed in flaky test

* Update random seed to reproduce the issue

* Fix Flaky unit test and add a training test

* Remove fixed seed in flaky test

* Update random seed to reproduce the issue

* Fix Flaky unit test and add a training test

* Increase accuracy check

disable opencv threading for forked process (apache#12025)

Bug fixes in control flow operators (apache#11942)

Fix data narrowing warning on graph_executor.cc (apache#11969)

Fix flaky tests for test_squared_hinge_loss (apache#12017)

Fix flaky tests for test_hinge_loss (apache#12020)

remove fixed seed for test_sparse_ndarray/test_operator_gpu.test_sparse_nd_pickle (apache#12012)

Removed fixed seed from , test_loss:test_ctc_loss_train (apache#11985)

Removed fixed seed from , test_loss:test_sample_weight_loss (apache#11986)

Fix reduce_kernel_M1 (apache#12026)

* Fix reduce_kernel_M1

* Improve test_norm

Update test_loss.py to remove fixed seed (apache#11995)

[MXNET-23] Adding support to profile kvstore server during distributed training  (apache#11215)

* server profiling

merge with master

cleanup old code

added a check and better info message

add functions for C compatibility

fix doc

lint fixes

fix compile issues

lint fix

build error

update function signatures to preserve compatibility

fix comments

lint

* add part1 of test

* add integration test

Re-enabling test_ndarray/test_cached (apache#11950)

Test passes on CPU and GPU (10000 runs)

make gluon rnn layers hybrid blocks (apache#11482)

* make Gluon RNN layer hybrid block

* separate gluon gpu tests

* remove excess assert_raises_cudnn_disabled usage

* add comments and refactor

* add bidirectional test

* temporarily remove hybridize in test_gluon_rnn.test_layer_fill_shape

[MXNET-751] fix bce_loss flaky (apache#11955)

* add fix to bce_loss

* add comments

* remove unecessary comments

Doc fix for a few optimizers (apache#12034)

* Update optimizer.py

* Update optimizer.py
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
* make Gluon RNN layer hybrid block

* separate gluon gpu tests

* remove excess assert_raises_cudnn_disabled usage

* add comments and refactor

* add bidirectional test

* temporarily remove hybridize in test_gluon_rnn.test_layer_fill_shape
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

gluon.rnn layers should use fused RNN operator and become HybridBlock
6 participants