Enable serializing/deserializing ndarrays in np_shape semantics #15090

reminisce · 2019-05-29T07:02:48Z

Description

np_shape semantics was introduced to support future NumPy operators where scalar tensors and zero-size tensors are common to see. Due to the concern on the potential issues of backward compatibility when this semantics is enabled, such as different handling on scalar tensors w/ or w/o this semantics, serializing/deserializing was simply marked as unsupported when this semantics is enabled.

At the moment, DGL developers want to enable this semantics in their work to support zero-size tensors. Simply disabling serializing/deserializing ndarrays of all types: dense, sparse, zero-size, and scalars would make their unit tests fail in np_shape semantics.

After careful consideration, we decided to loosen the constraint to support serialization/deserialization in the semantics of np_shape for ndarrays satisfying ALL the following three conditions as it would be the same as handling future NumPy ndarrays.

The storage type MUST be default type, i.e. this is a dense ndarray.
The ndarray CANNOT be a zero-size ndarray, i.e. with shape like (2, 0, 3).
The ndarray CANNOT be a scalar ndarray, i.e. with shape ().

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

src/ndarray/ndarray.cc

abhinavs95 · 2019-05-30T16:44:30Z

@mxnet-label-bot add [NDArray]

… scope of np_shape

reminisce · 2019-05-31T06:53:34Z

@zheng-da Please test this PR in DGL to see if any test of saving/loading zero-size tensors is broken before merge.

marcoabreu · 2019-05-31T13:34:46Z

src/ndarray/ndarray.cc

@@ -1580,13 +1581,20 @@ static const uint32_t NDARRAY_V1_MAGIC = 0xF993fac8;
 /* magic number for ndarray version 2, with storage type */
 static const uint32_t NDARRAY_V2_MAGIC = 0xF993fac9;

+// magic number for ndarray version 3, with np shape semantics.
+// The ndarray must be saved and loaded within np shape semantics.
+static const uint32_t NDARRAY_V3_MAGIC = 0xF993faca;


Do we have any tests for handling legacy storage types?

Yes. It's here.
https://github.com/apache/incubator-mxnet/pull/15090/files#diff-69757562d07268150de8b369ff5b6b61R1725

zheng-da · 2019-06-01T01:05:08Z

@reminisce I checked with DGL. it works fine.

…he#15090) * Loosen the contraint on serializing/deserializing ndarrays within the scope of np_shape * Support save/load dense ndarrays in np_shape semantics

reminisce requested review from zheng-da, szha and eric-haibin-lin May 29, 2019 07:02

zheng-da approved these changes May 29, 2019

View reviewed changes

szha reviewed May 29, 2019

View reviewed changes

src/ndarray/ndarray.cc Outdated Show resolved Hide resolved

marcoabreu added the NDArray label May 30, 2019

reminisce added 2 commits May 30, 2019 21:39

Loosen the contraint on serializing/deserializing ndarrays within the…

189ff8f

… scope of np_shape

Support save/load dense ndarrays in np_shape semantics

610606b

reminisce force-pushed the loosen_ndarray_serialization_constraint branch from ab68e7e to 610606b Compare May 31, 2019 06:36

szha approved these changes May 31, 2019

View reviewed changes

reminisce changed the title ~~Loosen the constraint on serializing/deserializing ndarrays in np_shape semantics~~ Enable serializing/deserializing ndarrays in np_shape semantics May 31, 2019

marcoabreu reviewed May 31, 2019

View reviewed changes

zheng-da merged commit e8a20fb into apache:master Jun 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable serializing/deserializing ndarrays in np_shape semantics #15090

Enable serializing/deserializing ndarrays in np_shape semantics #15090

reminisce commented May 29, 2019

abhinavs95 commented May 30, 2019

reminisce commented May 31, 2019

marcoabreu May 31, 2019

reminisce May 31, 2019

zheng-da commented Jun 1, 2019

Enable serializing/deserializing ndarrays in np_shape semantics #15090

Enable serializing/deserializing ndarrays in np_shape semantics #15090

Conversation

reminisce commented May 29, 2019

Description

Checklist

Essentials

abhinavs95 commented May 30, 2019

reminisce commented May 31, 2019

marcoabreu May 31, 2019

Choose a reason for hiding this comment

reminisce May 31, 2019

Choose a reason for hiding this comment

zheng-da commented Jun 1, 2019