Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-1382] Add the index_array operator #14638

Merged
merged 19 commits into from
May 25, 2019

Conversation

nickguletskii
Copy link
Contributor

@nickguletskii nickguletskii commented Apr 7, 2019

Description

This pull request implements index_array, an operator that returns an array of indexes of the input array.

For an input array with shape (d_1, d_2, ..., d_n), index_array returns a (d_1, d_2, ..., d_n, n) array idx, where idx[i_1, i_2, ..., i_n, :] = [i_1, i_2, ..., i_n].

Additionally, when the parameter axes is specified, idx will be a (d_1, d_2, ..., d_n, m) array where m is the length of axes, and the following
equality will hold: idx[i_1, i_2, ..., i_n, j] = i_{axes[j]}.

Examples:

x = mx.nd.ones((3, 2))

mx.nd.contrib.index_array(x) = [[[0 0]
                                 [0 1]]

                                [[1 0]
                                 [1 1]]

                                [[2 0]
                                 [2 1]]]

x = mx.nd.ones((3, 2, 2))

mx.nd.contrib.index_array(x, axes=(1, 0)) = [[[[0 0]
                                               [0 0]]

                                              [[1 0]
                                               [1 0]]]


                                             [[[0 1]
                                               [0 1]]

                                              [[1 1]
                                               [1 1]]]

Motivation

This operator can be used to generate meshgrids for tensors without knowing their exact shapes during construction. For instance, this operator can be used to make a makeshift prior box generator for anchor-based computer vision models:

feature_map = F.ones((8, 128, 128, 256)) # N x H x W x C, no shape information when using the Symbol API.
prior_box_stride = 16
box_size=[8, 8]

template = F.squeeze(F.slice_axis(feature_map, begin=0, end=1, axis=-1), axis=-1) # N x H x W
box_centres = F.contrib.index_array(template, axes=(-2, -1, -2, -1)).astype("float32") # N x H x W x 4
box_centres = F.broadcast_mul(box_centres, F.array([prior_box_stride]).reshape((1, 1, 1, 1))) # N x H x W x 4
corner_offsets = F.array(box_size).reshape((1, 1, 1, 2))
corner_offsets = F.concat(-corner_offsets/2, corner_offsets/2, dim=-1)
box_corners = F.broadcast_plus(box_centres, corner_offsets)

Also, this operator can be applied to implement positional encodings for sequence processing, e.g.:

sequence_embeddings = F.ones((65, 8, 256)) # T x N x C, no shape information when using the Symbol API.
template = sequence_embeddings.reshape((0, 0, -1, 2)) # T x N x C -> T x N x (C/2) x 2
pos, i = F.split(
    F.contrib.index_array(template, axes=(0, 2)).astype("float32"), # T x N x (C/2) x 2 x 2
    axis=-1,
    num_outputs=2,
    squeeze_axis=True
) # T x N x (C/2) x 2 and T x N x (C/2) x 2
base = F.ones((1, 1, 1, 1)) * 10000
dmodel = F.slice_axis(F.shape_array(sequence_embeddings), begin=-1, end=None, axis=0)
dmodel = dmodel.reshape((1, 1, 1, 1)).astype("float32")
tmp = F.broadcast_div(pos, F.broadcast_power(base, F.broadcast_div(2 * i,  dmodel))) # T x N x (C/2) x 2
sin_input, cos_input = F.split(tmp, axis=-1, num_outputs=2, squeeze_axis=True) # T x N x (C/2) and T x N x (C/2)
positional_encoding = F.stack(F.sin(sin_input), F.cos(cos_input), axis=-1).reshape((0, 0, -3)) # T x N x C

I've also encountered situations where this operator would have been useful for some indexing tricks.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • The IndexArray operator (for the CPU)
  • The IndexArray operator (for the GPU)
  • Tests for the CPU implementation
  • Tests for the GPU implementation
  • Entries on the Python API reference pages

Comments

  • This operator always returns kInt64.
  • The new tests in test_operator_gpu.py are exactly the same as in test_operator.py. I couldn't find any evidence that test_operator.py ever gets called with a GPU default_context, so I copied the tests into test_operator_gpu.py to make sure that the GPU implementation works too.

@nickguletskii nickguletskii requested a review from szha as a code owner April 7, 2019 16:24
@piyushghai
Copy link
Contributor

Thanks for your contributions @nickguletskii. Can you also look into the CI failures ?
@mxnet-label-bot Add [Opeartor, pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Apr 7, 2019
@piyushghai
Copy link
Contributor

@mxnet-label-bot Update [Operator, pr-awaiting-review]

@nickguletskii
Copy link
Contributor Author

nickguletskii commented Apr 7, 2019

The CI failure seems to be unrelated. Here's the relevant part of the log:

[2019-04-03 23:57:30   ERROR] Cuda initialization failure with error 35. Please check cuda installation:  http:https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html.

terminate called after throwing an instance of 'std::runtime_error'

  what():  Failed to create object

If I am not mistaken, this error code corresponds to the following error message: "CUDA driver version is insufficient for CUDA runtime version". Could this be caused by the recent update to the nvidia/cuda:10.0-cudnn7-devel Docker tag? The TensorRT job uses nvidia/cuda:10.0-cudnn7-devel, while other jobs use nvidia/cuda:9.1-cudnn7-devel (e.g. the ubuntu_gpu build), which may be why the other jobs are not failing.

@fhieber
Copy link
Contributor

fhieber commented Apr 9, 2019

Thanks for implementing this! This seems similar to what I requested in #14574, correct?

Any reason this should be in the contrib namespace?

@nickguletskii
Copy link
Contributor Author

@fhieber In a way, yes. The behaviour you have described in #14574 can be emulated by applying reshape, index_array, broadcast_add, and finally, reshape_list. I was thinking of implementing arange_like instead of index_array, but I don't find the arange operator itself versatile enough and I didn't want to make arange_like too different from arange.

There's no particular reason why I've placed index_array into the contrib namespace other than the fact that it's a new operator. I'd be happy to update this pull request and move index_array out of the contrib namespace if the community deems it desirable.

@fhieber
Copy link
Contributor

fhieber commented Apr 11, 2019

Thanks @nickguletskii , I agree, your operator seems more versatile, thanks for working on it!

contrib is fine with me, I just asked out of curiosity on whether the preferred or standard way of introducing new operators is to always place in contrib first.

@nickguletskii
Copy link
Contributor Author

@piyushghai Sorry to bother you, but should I rebase this pull request on master now that the failing job (test_unix_python3_tensorrt_gpu) has been disabled (#14642), or should I just leave it as it is and wait for the review?

@piyushghai
Copy link
Contributor

@nickguletskii You should rebase with the latest master branch. That'll enable you to push again, allowing the CI to pass on your PR.
The reviews will happen on it in the meanwhile :)

@nickguletskii
Copy link
Contributor Author

Huh, this is weird, I've updated this PR to be compatible with the recently merged support for zero-dim and zero-shape arrays, and now the CI for Windows CPU tests are failing. However, when I try to run the tests locally (on a 64 bit Windows machine), the tests pass just fine. I'll make further attempts to reproduce this issue on my local machine tomorrow. I suspect that I've screwed up with memory access somewhere because the test runner crashes without providing any useful debugging information.

@nickguletskii
Copy link
Contributor Author

The previous CI failures on Windows were caused by indexing C-style arrays using ints. I'm not sure whether this was just undefined behaviour or a bug in the older versions of MSVC++, but it did not happen with VC++ 16.0. I fixed the issue by using ptrdiff_t to index C-style arrays.

@nickguletskii
Copy link
Contributor Author

nickguletskii commented May 2, 2019

@szha @reminisce This PR should be ready for a review now. The CI failure seems to be unrelated (the failing test is test_operator_gpu.test_convolution_multiple_streams, described in #14329).

@vandanavk
Copy link
Contributor

@szha @reminisce @fhieber Is this PR good to go?

Copy link
Contributor

@reminisce reminisce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution and sorry for the late review. Have some comments on code cosmetics.

src/operator/contrib/index_array.cc Outdated Show resolved Hide resolved
src/operator/contrib/index_array.cc Outdated Show resolved Hide resolved
src/operator/contrib/index_array.cc Outdated Show resolved Hide resolved
src/operator/contrib/index_array-inl.h Outdated Show resolved Hide resolved
@@ -2130,6 +2130,68 @@ def test_bilinear_sampler_versions():
if req_dict['grid'] is 'write':
assert_almost_equal(exe.grad_dict['grid'].asnumpy(), exe_list[ref_idx].grad_dict['grid'].asnumpy(), rtol=1e-3, atol=1e-5)

@with_seed()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests defined in test_operator.py will be run with ctx=mx.gpu() in CI. There is no need to duplicate test code in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've removed these redundant tests in 3cdfcf5.

tests/python/unittest/test_operator.py Show resolved Hide resolved

@with_seed()
def test_index_array_default_zero_dim():
with mx.np_compat(active=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the whole function is expected to be numpy compatible, you can use a decorator mx.use_np_compat to wrap the whole function. It would be easier for us to just delete the decorator in the next major release than putting additional efforts of adjusting indentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've updated the subtests to use @mx.use_np_compat in 139567b.


@with_seed()
def test_index_array_default_zero_size():
with mx.np_compat(active=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, please use the decorator mx.use_np_compat.


@with_seed()
def test_index_array_select_axes_zero_size():
with mx.np_compat(active=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@karan6181
Copy link
Contributor

@nickguletskii Could you please re-trigger the CI again?
@szha Could you please review this PR? Thanks!

@szha szha merged commit f689042 into apache:master May 25, 2019
@nickguletskii nickguletskii deleted the feature/index-array-op branch May 27, 2019 16:20
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* Implement the index_array operator

* Add index_array operator tests

* Add index_array operator GPU tests

* Add the index_array operator to the Python docs autosummary

* Add the author of the index_array operator to CONTRIBUTORS.md

* Make index_array compatible with zero-dim and zero-size arrays

Changes the implementation of index_array to be compatible with the
recently merged support for zero-dim and zero-size arrays. Resolves the
incompatibilities with apache#14661.

* Fix the index_array gradient checks in the unit tests

In the previous implementation, the output gradient had an incorrect
shape. This commit fixes the shapes and makes the tests more readable.

* Add zero-dim and zero-size array tests for index_array

* Use mxnet::Tuple<int> instead of TShape for the axes parameter

* Fix incorrect array indexing in index_array

Solves access violations when compiling with MSVC++ 14.0.

* Avoid copying the input shape array in the index_array shape function

* Add unknown shape handling to index_array

* Use SHAPE_ASSIGN_CHECK to assign the shape in index_array

* Remove the redundant index_array GPU tests from test_operator_gpu.py

* Move the index_array tests into a single function (test_index_array)

* Use @mx.use_np_compat instead of mx.np_compat in index_array op tests

* Remove the use of template specialization for IndexArrayForward

* Add the index_array operator to the AMP symbol list

* Retrigger CI
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Operator pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants