Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-1398] Enable zero-copy from numpy to MXNet NDArray #14733

Merged
merged 8 commits into from
Apr 28, 2019

Conversation

junrushao
Copy link
Member

@junrushao junrushao commented Apr 18, 2019

Description

This PR allows users to convert numpy's NDArray without copying content data. This is achieved by converting numpy's NDArray to DLManagedTensor, which is introduced in dmlc/dlpack#38. We expect that this would help users reduce major performance bottleneck when this conversion is conducted frequently.

See also: #14244.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Rewrite NDArray::FromDLPack to make it resource safe.
  • Add DLManagedTensor and its deleter via ctypes that allows C++ side deletes it.

Comments

Nothing.

@junrushao junrushao requested a review from szha as a code owner April 18, 2019 08:00
@Roshrini Roshrini added the pr-work-in-progress PR is still work in progress label Apr 18, 2019
@junrushao junrushao changed the title [WIP] Enable zero-copy from numpy to MXNet NDArray Enable zero-copy from numpy to MXNet NDArray Apr 19, 2019
@junrushao
Copy link
Member Author

@reminisce @wkcn Could you help review? Thanks!

Copy link
Contributor

@reminisce reminisce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

python/mxnet/ndarray/ndarray.py Outdated Show resolved Hide resolved
python/mxnet/ndarray/ndarray.py Outdated Show resolved Hide resolved
python/mxnet/ndarray/ndarray.py Outdated Show resolved Hide resolved
python/mxnet/ndarray/ndarray.py Outdated Show resolved Hide resolved
python/mxnet/ndarray/ndarray.py Outdated Show resolved Hide resolved
tests/python/unittest/test_ndarray.py Outdated Show resolved Hide resolved
tests/python/unittest/test_ndarray.py Show resolved Hide resolved
python/mxnet/ndarray/ndarray.py Show resolved Hide resolved
Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call MXNDArrayFromDLPack rather than MXNDArrayFromDLManagedTensor.

include/mxnet/c_api.h Outdated Show resolved Hide resolved
src/ndarray/ndarray.cc Outdated Show resolved Hide resolved
tests/python/unittest/test_ndarray.py Outdated Show resolved Hide resolved
python/mxnet/ndarray/ndarray.py Show resolved Hide resolved
@junrushao
Copy link
Member Author

@reminisce Hey I have already addressed your comments. Could you review again? Thanks!

@junrushao junrushao changed the title Enable zero-copy from numpy to MXNet NDArray [MXNET-1398] Enable zero-copy from numpy to MXNet NDArray Apr 27, 2019
Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! LGTM. Thanks for your contribution!

Copy link
Contributor

@reminisce reminisce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the awesome work!

@wkcn wkcn added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-work-in-progress PR is still work in progress labels Apr 28, 2019
@wkcn wkcn merged commit c18381d into apache:master Apr 28, 2019
@wkcn
Copy link
Member

wkcn commented Apr 28, 2019

Merged. Thank you!


if not ndarray.flags['C_CONTIGUOUS']:
raise ValueError("Only c-contiguous arrays are supported for zero-copy")
c_obj = _make_dl_managed_tensor(ndarray)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when doing zero-copy, should the ownership of the data pointer be transferred to the new ndarray? do we need to update the writable flag? what happens when the original data is updated by numpy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szha They share the ownership - which means numpy's modification is transparent to mxnet and vice versa. As requested by @reminisce in his review, I added a testcase for this transparency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does it work when using the asynchronous engine? since numpy calls happen immediately, allowing shared ownership may give unexpected results. consider the following case:

a = np.array(...)
b = nd.from_numpy(a, zero_copy=True)
c = nd.expensive_op(b, in_place=True)
d = np.inplace_op(a) # this is called when c hasn't finished yet

given the asynchronous execution, it's hard to tell whether people should expect d to be based on the input before or after expensive_op

Copy link
Member Author

@junrushao junrushao Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We actually have such discussion long time before, but it's good to make things open and achievable to the community)

My understanding is that your concern is a general question in multi-threading (see the code below). In any multi-threaded system, we may expect this issue to exist -- and MXNet has multi-threaded engine, so this issue exists, right?

We have discussion that how about the content of array is changed somewhere inside the system but user don't know that, should we set up the WRITEABLE flag to be true to prevent users from writing, or should we completely disallow users to access this numpy array?

The question, in the high-level, is all about trade-off. Restricting the freedom of users may make things less error-prone, while giving users full freedom may make the system more powerful.

On one hand, in my humble opinion, restricting the freedom of users, like setting up WRITEABLE flag, does not completely resolve this issue, because users can still encounter pitfalls when they read the array using the numpy side API. The only helps in this case is to completely disable users from read or write the array, which grant them on access only the numpy side - This is more like guaranteeing safety by disallowing users to create threads/processes.

On the other hand, giving users freedom enables them to do more things, and does not necessarily mean that they will definitely make a mistake, because we will encourage to use mxnet's numpy compatibility, which is on the way.

Anyway, I am open to any discussion and open to possible changes setting up several flags on the numpy array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API should have clear semantics regardless of users’ knowledge on how MXNet execution works. If you worry about flexibility I’d recommend making it an option to not disable writable flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we add a zero-copy option to asnumpy where the ownership is completely transferred back to numpy, I think there wouldn't be any need for co-ownership.

Copy link
Member

@wkcn wkcn Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following case:

# the variable `a` is a `mx.nd.NDArray` object
a = mx.nd.array([1,2,3])
b = a.asnumpy(zero_copy=True)
#  users call `asnumpy(zero_copy=True)` twice
c = a.asnumpy(zero_copy=True)
c[:] = 3
print(a)
print(b)

a[:] = 5
a.wait_to_read()
print(b)
print(c)

It's necessary to keep co-ownership.
We can add extra option in from_numpy and asnumpy to prompt users to call the function wait_to_read().
Alternatively, can we add a hook into numpy.ndarray? When users access numpy.ndarray, wait_to_read() will be called automatically.
Reference: https://docs.scipy.org/doc/numpy/reference/arrays.classes.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If zero-copy is on, then a should not be usable after the asnumpy call. The point is to let the framework deal with all synchronization, and co-ownership doesn't allow that. And for the same case, you can simply copy c from b (the new numpy array) instead of a (the old ndarray), so it's not necessary to keep co-ownership.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hook in the context seems to refer to our own subclass of numpy array. Creating a new subclass seems like overkill.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, @szha's suggestion was addressed in #14948

access2rohit pushed a commit to access2rohit/incubator-mxnet that referenced this pull request May 14, 2019
* Enable zero-copy from numpy to MXNet NDArray

* should work

* make lint happy

* fix stupid typos

* wip to address comments

* Address comments

* Address comments

* Remove redundant code
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* Enable zero-copy from numpy to MXNet NDArray

* should work

* make lint happy

* fix stupid typos

* wip to address comments

* Address comments

* Address comments

* Remove redundant code
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants