[MXNET-1408] Adding test to verify Large Tensor Support for ravel and unravel #15048

access2rohit · 2019-05-22T23:55:46Z

Description

New Ops: ravel_multi_index, unravel_index

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-1408], where [MXNET-1408] Adding test to verify Large Tensor Support for ravel and unravel #15048 1408 refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

access2rohit · 2019-05-23T00:04:08Z

@mxnet-label-bot add [pr-awaiting-review]

access2rohit · 2019-05-23T00:04:18Z

@apeforest : Please review

apeforest · 2019-05-23T04:56:54Z

tests/nightly/test_large_array.py

@@ -279,6 +279,19 @@ def test_diag():
 assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k))


+def test_ravel_and_unravel():


break it into two tests

Its more readable and understandable this way. Also I have split it into 2 functions inside.

apeforest · 2019-05-23T04:59:21Z

tests/nightly/test_large_array.py

+
+ def test_unravel_index():
+ original_2d_idxs = mx.nd.unravel_index(mx.nd.array(idx, dtype=np.int64), shape=(LARGE_X,SMALL_Y))
+ assert (original_2d_idxs.asnumpy() == np.array(idxs_2d)).all() == True


remove == True

marcoabreu · 2019-05-24T18:41:21Z

tests/nightly/test_large_array.py

@@ -279,6 +279,19 @@ def test_diag():
 assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k))


+def test_ravel_and_unravel():
+ idxs_2d = [[LARGE_X-1,LARGE_X-100,6],[SMALL_Y-1,SMALL_Y-10,1]]


Please use C++ tests with mocking of the memory allocation instead of actually allocating memory.

Although this is nightly, I don't think it's necessary for these kind of tests to actually allocate real memory.

@perdasilva

We can add C++ unit test to mock the memory allocation. However, I thought the nightly test is used for end-to-end testing purpose to mimic the real deployment situation, or is it not?

They are to some extent, but in same cases the cost vs return ratio is not that good. While smoke tests are certainly necessary in some cases, resource intensive ones on the other hand can usually be abstracted away. If we assume that memory access on the system side is properly working and only mock this very laster layer, this result should be pretty much as meaningful as running it end to end.

@marcoabreu : I didn't understand your last comment. What did you mean by "very laster layer" ? All these tests are single operator tests. This tests I combined because its easier to test when combined together

Hi @marcoabreu, thanks a lot for your suggestion and it's a very good learning for me too. The large tensor support is crucial to DGL (Deep Graph Library) that is using MXNet and planning to release with support of large tensors. Without this test, we cannot make this feature production ready.

While I appreciate the direction of testing you suggested and definitely would like to explore it further, I am also afraid that not adding tests before we implement the approach you suggested will block the release of DGL and other MXNet projects that depend on large tensors as well.

Can we have a seperate project to address the test efficiency (we may need some expertise from CI team for this project) while moving on with the large tensor project? Please let me know if you have other concerns or advice. Thank you

@marcoabreu : We have to test cases where we have to allocate large tensors on actual physical memory. That is the whole purpose of adding these tests. Without which we cannot support or even guarantee large dense tensor support on MXNet.

The purpose of these tests are not at all to test only Sparse. Currently we are not addressing performance issues with large dense arrays but trying to support them. From your reply it seems like you are under the impression that everything should work fine if tensor size >2^32 elements(Please correct me if I am wrong). But that's not the case and the operator simply segfaults or gives unpredictable behavior. That is why is necessary to test them with such large arrays/tensors.

@marcoabreu : Let me know if you have other unanswered queries that will help you understand the use case better and unblock this PR.

Hey,

I understand the importance of large tensors and I'm really looking forward to them as well. Don't get me wrong, I'm not underestimating the work that comes with supporting them.

We already had a history of issues that were caused by too large allocations. Thus, for the sake of keeping CI stable (independent on whether we're running on per-PR or nightly) and to adhere to a high test standard, I'd prefer if we would not delay adressing the test strategy.

I think there's a misunderstanding. With my proposed approach, you would still be able to make a full test of a single operator. But instead of randomly generating the input tensor, you would use a generator model that gives you predictable and procedually generatable input tensor instead of a large and randomly generated one.

The idea here would then be to give this procedually generated input tensor to the operator and since it's a predicutable input, you can also have a predictable output that you can verify. On the way, the operator will run as usual, it's just that the source of the data is not physical memory but generated values.

The only case that wouldn't be supported is the chaining of operators since that would require storing the intermediary values.

The idea here is that (in whatever fashion you'd like) we generate fake input and output tensors that return predictable and valid data, but doesn't actually consume memory on the RAM. From MXNets perspective, it shouldn't matter where the real values is coming from. For the sake of an operator, it's important that it actually retrieves proper values and that would be the case here.

@marcoabreu I still don't understand how will I know that on actual large tensor after memory allocation would it work as expected or not. Procedurally generating the values will indeed give me random values on random locations that are indexed at locations > 2^32. But the input should come from RAM. it does matter. It needs to be tested on actual physical memory so there are no surprises. If nighly tests are not feasible do we have a weekly test where we can run this ?

apeforest · 2019-05-28T17:13:03Z

tests/nightly/test_large_array.py

@@ -279,6 +279,19 @@ def test_diag():
 assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k))


+def test_ravel_and_unravel():
+ idxs_2d = [[LARGE_X-1,LARGE_X-100,6],[SMALL_Y-1,SMALL_Y-10,1]]


nit: space between operators and operands

apeforest · 2019-05-28T17:13:43Z

tests/nightly/test_large_array.py

+ idx = mx.nd.array(1)
+ def test_ravel_multi_index():
+ idx = mx.nd.ravel_multi_index(mx.nd.array(idxs_2d, dtype=np.int64), shape=(LARGE_X,SMALL_Y))
+ idx_numpy = np.ravel_multi_index(idxs_2d, (LARGE_X,SMALL_Y))


nit: space. btw, do you see any pylint warning out of these?

apeforest · 2019-05-28T17:14:18Z

tests/nightly/test_large_array.py

+ assert np.sum(1 for i in range(idx.size) if idx[i] == idx_numpy[i]) == 3
+
+ def test_unravel_index():
+ original_2d_idxs = mx.nd.unravel_index(mx.nd.array(idx, dtype=np.int64), shape=(LARGE_X,SMALL_Y))


same: add space

apeforest · 2019-05-28T17:15:57Z

tests/nightly/test_large_array.py

@@ -279,6 +279,19 @@ def test_diag():
 assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k))


+def test_ravel_and_unravel():
+ idxs_2d = [[LARGE_X-1,LARGE_X-100,6],[SMALL_Y-1,SMALL_Y-10,1]]


nit: do you mean indices_2d?

yes . Though make pylint gives this output.

ubuntu@ip-172-31-82-110 ~/mxnet3 (master) $ make pylint
Makefile:345: WARNING: Significant performance increases can be achieved by installing and enabling gperftools or jemalloc development packages
python3 -m pylint --rcfile=/home/ubuntu/mxnet3/ci/other/pylintrc --ignore-patterns="..so$,..dll$,..dylib$" python/mxnet tools/caffe_converter/.py
Using config file /home/ubuntu/mxnet3/ci/other/pylintrc

Your code has been rated at 10.00/10

I will still fix it. Dunno why pylint isn't catching it.

got it .... mxnet makefile doesn't do pylint on tests/nightly and tests/python/unittest ... added manually to local repo. Thanks for the advice !

access2rohit · 2019-05-28T18:28:02Z

tests/nightly/test_large_array.py

 import numpy as np
+import mxnet as mx


Suggested reordering of imports by pylint

access2rohit · 2019-05-28T18:28:20Z

tests/nightly/test_large_array.py

@@ -130,13 +130,6 @@ def test_clip():
 assert np.sum(res[-1].asnumpy() == 1000) == a.shape[1]


-def test_take():


This was duplicate

apeforest · 2019-05-28T19:34:48Z

tests/nightly/test_large_array.py

+ idx_numpy = np.ravel_multi_index(indices_2d, (LARGE_X, SMALL_Y))
+ assert np.sum(1 for i in range(idx.size) if idx[i] == idx_numpy[i]) == 3
+
+


remove extra line

marcoabreu

I just talked to Lin and we agreed on merging the end to end tests for now. The condition is that they have to be replaced by 15th of July or the next minor release that includes large tensor support; whichever comes first.

apeforest · 2019-05-29T22:38:37Z

tests/nightly/test_large_array.py

@@ -254,8 +247,8 @@ def numpy_space_to_depth(x, blocksize):


 def test_diag():
- h = np.random.randint(2,9)
- w = np.random.randint(2,9)
+ h = np.random.randint(2, 9)


I do not see the necessity of creating h and w variables here.

If you decide to move forward with them, please add the with seed annotation to the test

@apeforest Thats how the unittest for diagonal operator is implemented.
@marcoabreu Good point will add @with_seed()

apeforest · 2019-05-31T18:01:37Z

tests/nightly/test_large_array.py

 r = mx.nd.diag(a, k=k)
 assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k))


+def test_ravel_multi_index():
+ indices_2d = [[LARGE_X-1, LARGE_X-100, 6], [SMALL_Y-1, SMALL_Y-10, 1]]


can we generate the random indices instead?

apeforest · 2019-05-31T18:02:42Z

tests/nightly/test_large_array.py

+
+
+def test_unravel_index():
+ original_2d_indices = [[LARGE_X-1, LARGE_X-100, 6], [SMALL_Y-1, SMALL_Y-10, 1]]


same here. can we generate random indices? you can pass in the 2d_array as input so you only keep one copy

The input is not 2d_array, we pass indices as input. I can do following:
low_large_value = 2**32
high_large_value = 2**34
a = nd.random.randint(low_large_value, high_large_value, dtype=np.int64)
for random Large indices
Good Point

apeforest · 2019-05-31T18:03:25Z

tests/nightly/test_large_array.py

@@ -216,8 +209,8 @@ def test_where():

 def test_pick():
 a = mx.nd.ones(shape=(256*35, 1024*1024))
- b = mx.nd.ones(shape=(256*35,))
- res = mx.nd.pick(a,b)
+ b = mx.nd.ones(shape=(256*35, ))


nit: space around * operator

apeforest · 2019-05-31T22:04:25Z

tests/nightly/test_large_array.py

 r = mx.nd.diag(a, k=k)
 assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k))


+def random_2d_coordinates(x_low, x_high, y_low, y_high):


It seems you are re-use the existing utility method: https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/test_utils.py#L410

it is different and we need one value that to very large for tests case to be successful not any values between (1, very large value)

will move this to test_utils.py

… unravel

apeforest

LGTM

… unravel (apache#15048)

marcoabreu added the pr-awaiting-review PR is waiting for code review label May 23, 2019

apeforest reviewed May 23, 2019

View reviewed changes

access2rohit force-pushed the master branch 2 times, most recently from 7c915e3 to e2970e9 Compare May 23, 2019 23:48

marcoabreu suggested changes May 24, 2019

View reviewed changes

apeforest reviewed May 28, 2019

View reviewed changes

access2rohit force-pushed the master branch from e2970e9 to ade0516 Compare May 28, 2019 18:25

access2rohit commented May 28, 2019

View reviewed changes

tests/nightly/test_large_array.py

import numpy as np

import mxnet as mx

Copy link

Contributor Author

access2rohit May 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested reordering of imports by pylint

access2rohit commented May 28, 2019

View reviewed changes

apeforest reviewed May 28, 2019

View reviewed changes

access2rohit force-pushed the master branch from ade0516 to 61fa38e Compare May 28, 2019 20:58

marcoabreu approved these changes May 29, 2019

View reviewed changes

apeforest reviewed May 29, 2019

View reviewed changes

access2rohit force-pushed the master branch from 61fa38e to b228268 Compare May 31, 2019 17:55

apeforest reviewed May 31, 2019

View reviewed changes

access2rohit force-pushed the master branch from b228268 to 14d7276 Compare May 31, 2019 21:07

apeforest reviewed May 31, 2019

View reviewed changes

access2rohit force-pushed the master branch from 14d7276 to e2a07d2 Compare May 31, 2019 22:17

access2rohit requested a review from szha as a code owner May 31, 2019 22:17

[MXNET-1408] Adding test to verify Large Tensor Support for ravel and…

71046be

… unravel

access2rohit force-pushed the master branch from 07fb649 to 71046be Compare May 31, 2019 22:22

apeforest approved these changes May 31, 2019

View reviewed changes

apeforest merged commit 75a90d0 into apache:master Jun 1, 2019

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

[MXNET-1408] Adding test to verify Large Tensor Support for ravel and…

d01d8fb

… unravel (apache#15048)

		@@ -279,6 +279,19 @@ def test_diag():
		assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k))


		def test_ravel_and_unravel():

		@@ -130,13 +130,6 @@ def test_clip():
		assert np.sum(res[-1].asnumpy() == 1000) == a.shape[1]


		def test_take():

		idx_numpy = np.ravel_multi_index(indices_2d, (LARGE_X, SMALL_Y))
		assert np.sum(1 for i in range(idx.size) if idx[i] == idx_numpy[i]) == 3



		def test_unravel_index():
		original_2d_indices = [[LARGE_X-1, LARGE_X-100, 6], [SMALL_Y-1, SMALL_Y-10, 1]]

[MXNET-1408] Adding test to verify Large Tensor Support for ravel and unravel #15048

[MXNET-1408] Adding test to verify Large Tensor Support for ravel and unravel #15048

Conversation

access2rohit commented May 22, 2019 • edited Loading

Description

Checklist

Essentials

Changes

Comments

access2rohit commented May 23, 2019

access2rohit commented May 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcoabreu May 24, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcoabreu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

access2rohit May 31, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apeforest left a comment

Choose a reason for hiding this comment

access2rohit commented May 22, 2019 •

edited

Loading

marcoabreu May 24, 2019 •

edited

Loading

access2rohit May 31, 2019 •

edited

Loading