[Large Tensor] Add LT support for NN optimizers and 1 activation function #17444

ChaiBapchya · 2020-01-27T06:43:59Z

Description

Add large tensor support to optimizers and 1 activation function

hard_sigmoid
adam_update
ftml_update
mp_sgd_mom_update
mp_sgd_update
rmsprop_update
rmspropalex_update
sgd_mom_update
sgd_update
signsgd_update
signum_update
nagmom
mp_nagmom
lamb
mp_lamb
ftrl
adagrad

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Code is well-documented:
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

modified: src/operator/optimizer_op-inl.h
modified: src/operator/tensor/elemwise_unary_op.h

Comments

Tested hard_sigmoid with LT input : Pass

>>> import mxnet as mx
>>> mx.nd.hard_sigmoid(data=mx.nd.random_normal(shape=(1, 2**32 + 1))) [[0.9424413 0.6548008 0.7086881 ... 0.53579605 0.37985992 0.20645571]] 
<NDArray 1x4294967297 @cpu(0)>

Rest of the *_update functions can't be tested with random_normal inputs as they give NaNs as result (even for shape < 2**32)
Hence not tested. But they don't give a segmentation fault (which previously was the problem due to lack of Large tensor support).

access2rohit · 2020-01-28T17:13:01Z

@ChaiBapchya can you paste the tests run log of opperf indicating they run fine w/o giving SIGSEGV

src/operator/optimizer_op-inl.h

access2rohit · 2020-01-28T17:28:36Z

Looks like a lot of ops didn't have correct type for mapping index of input values. LGTM but I would like to see logs that this doesn't segfault.

szha · 2020-01-30T12:17:43Z

cc @szhengac

szhengac · 2020-01-31T00:18:00Z

I think this needs to be tested on training a large model.

ChaiBapchya · 2020-01-31T01:00:51Z

@szhengac Which model? Which dataset? Can you give some specifics?
Thanks

src/operator/optimizer_op-inl.h

szhengac · 2020-01-31T01:20:26Z

@szhengac Which model? Which dataset? Can you give some specifics?
Thanks

I think a toy example with a very wide dense layer is good.

ChaiBapchya · 2020-01-31T20:32:53Z

So I tested MXNet (build from source using this branch)
with flags :

python -c "from mxnet.runtime import feature_list; print(feature_list())"
[✔ CUDA, ✔ CUDNN, ✖ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ OPENMP, ✖ SSE, ✔ F16C, ✔ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ BLAS_APPLE, ✔ LAPACK, ✔ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, ✖ CXX14, ✔ INT64_TENSOR_SIZE, ✔ SIGNAL_HANDLER, ✖ DEBUG, ✖ TVM_OP]

Results for training 10 epochs on 8 GPUS

INFO:root:[Epoch 0] train=0.120292 val=0.158000 loss=6.658037 time: 109.734473
  INFO:root:[Epoch 1] train=0.167548 val=0.179600 loss=2.297145 time: 92.212359
  INFO:root:[Epoch 2] train=0.210777 val=0.237700 loss=2.109626 time: 92.110430
  INFO:root:[Epoch 3] train=0.240705 val=0.255700 loss=2.032153 time: 92.476469
  INFO:root:[Epoch 4] train=0.262039 val=0.273600 loss=1.976788 time: 94.570572
  INFO:root:[Epoch 5] train=0.279728 val=0.302300 loss=1.915808 time: 91.655044
  INFO:root:[Epoch 6] train=0.295393 val=0.309900 loss=1.868357 time: 94.903087
  INFO:root:[Epoch 7] train=0.312901 val=0.331600 loss=1.825083 time: 94.501921
  INFO:root:[Epoch 8] train=0.330889 val=0.334100 loss=1.788333 time: 95.653459
  INFO:root:[Epoch 9] train=0.344211 val=0.349900 loss=1.757741 time: 94.065917

Is this fine?

ChaiBapchya · 2020-01-31T21:25:51Z

@mxnet-label-bot add [pr-awaiting-review]
@apeforest

@ChaiBapchya can you paste the tests run log of opperf indicating they run fine w/o giving SIGSEGV

>>> import mxnet as mx
>>> mx.nd.signum_update(weight=mx.nd.random_normal(shape=(2**32 + 1)), grad=mx.nd.random_normal(shape=(2**32 + 1)), mom=mx.nd.random_normal(shape=(2**32 + 1)), lr=.01)

[ 2.2022064   0.7840038   1.0334405  ...  0.18898012 -0.5907004
 -1.4777215 ]
<NDArray 4294967297 @cpu(0)>
>>> mx.nd.signsgd_update(weight=mx.nd.random_normal(shape=(2**32 + 1)), grad=mx.nd.random_normal(shape=(2**32 + 1)), lr=.01)

[ 0.15278001  1.7198559   0.14636855 ...  0.3357248  -0.22160508
  1.5340825 ]
<NDArray 4294967297 @cpu(0)>
>>> mx.nd.sgd_update(weight=mx.nd.random_normal(shape=(2**32 + 1)), grad=mx.nd.random_normal(shape=(2**32 + 1)), lr=.01)

[ 1.6252067   0.22516885  0.00959079 ... -0.688654    0.6969211
  0.00631838]
<NDArray 4294967297 @cpu(0)>
>>> mx.nd.sgd_mom_update(weight=mx.nd.random_normal(shape=(2**32 + 1)), grad=mx.nd.random_normal(shape=(2**32 + 1)), mom=mx.nd.random_normal(shape=(2**32 + 1)), lr=.01)

[ 0.9833377  -0.75289315  0.58504266 ... -1.0496317  -0.08228261
 -1.7657199 ]
<NDArray 4294967297 @cpu(0)>
>>> mx.nd.rmspropalex_update(weight=mx.nd.random_normal(shape=(2**32 + 1)), grad=mx.nd.random_normal(shape=(2**32 + 1)), n=mx.nd.random_normal(shape=(2**32 + 1)), lr=.01, g=mx.nd.random_normal(shape=(2**32 + 1)), delta=mx.nd.random_normal(shape=(2**32 + 1)))

[2.5003266         nan        nan ...        nan        nan 0.13144751]
<NDArray 4294967297 @cpu(0)>
>>> mx.nd.mp_sgd_update(weight=mx.nd.random_normal(shape=(2**32 + 1)), grad=mx.nd.random_normal(shape=(2**32 + 1)), weight32=mx.nd.random_normal(shape=(2**32 + 1)), lr=.01)

[ 1.1050267   0.6508057   0.13951734 ... -0.73946345  0.55659974
  1.9047947 ]
<NDArray 4294967297 @cpu(0)>
>>> mx.nd.mp_sgd_mom_update(weight=mx.nd.random_normal(shape=(2**32 + 1)), grad=mx.nd.random_normal(shape=(2**32 + 1)), mom=mx.nd.random_normal(shape=(2**32 + 1)), weight32=mx.nd.random_normal(shape=(2**32 + 1)), lr=.01)

[ 0.8880665  -1.852293    1.0043188  ... -0.5858472   0.554819
  0.26844773]
<NDArray 4294967297 @cpu(0)>
>>> mx.nd.ftml_update(weight=mx.nd.random_normal(shape=(2**32 + 1)), grad=mx.nd.random_normal(shape=(2**32 + 1)), d=mx.nd.random_normal(shape=(2**32 + 1)), v=mx.nd.random_normal(shape=(2**32 + 1)), z=mx.nd.random_normal(shape=(2**32 + 1)), lr=.01, t=1)

[ 0.05790505 -0.819279           nan ...         nan         nan
         nan]
<NDArray 4294967297 @cpu(0)>
>>> mx.nd.adam_update(weight=mx.nd.random_normal(shape=(2**32 + 1)), grad=mx.nd.random_normal(shape=(2**32 + 1)), mean=mx.nd.random_normal(shape=(2**32 + 1)), var=mx.nd.random_normal(shape=(2**32 + 1)), lr=.01)

[       nan -1.8923444        nan ...  1.6588118        nan        nan]
<NDArray 4294967297 @cpu(0)>

Previously they all used to give SIGSEGV, now they don't @access2rohit

szhengac · 2020-01-31T21:30:52Z

So I tested MXNet (build from source using this branch)
with flags :

python -c "from mxnet.runtime import feature_list; print(feature_list())"
[✔ CUDA, ✔ CUDNN, ✖ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ OPENMP, ✖ SSE, ✔ F16C, ✔ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ BLAS_APPLE, ✔ LAPACK, ✔ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, ✖ CXX14, ✔ INT64_TENSOR_SIZE, ✔ SIGNAL_HANDLER, ✖ DEBUG, ✖ TVM_OP]

Results for training 10 epochs on 8 GPUS

INFO:root:[Epoch 0] train=0.120292 val=0.158000 loss=6.658037 time: 109.734473
  INFO:root:[Epoch 1] train=0.167548 val=0.179600 loss=2.297145 time: 92.212359
  INFO:root:[Epoch 2] train=0.210777 val=0.237700 loss=2.109626 time: 92.110430
  INFO:root:[Epoch 3] train=0.240705 val=0.255700 loss=2.032153 time: 92.476469
  INFO:root:[Epoch 4] train=0.262039 val=0.273600 loss=1.976788 time: 94.570572
  INFO:root:[Epoch 5] train=0.279728 val=0.302300 loss=1.915808 time: 91.655044
  INFO:root:[Epoch 6] train=0.295393 val=0.309900 loss=1.868357 time: 94.903087
  INFO:root:[Epoch 7] train=0.312901 val=0.331600 loss=1.825083 time: 94.501921
  INFO:root:[Epoch 8] train=0.330889 val=0.334100 loss=1.788333 time: 95.653459
  INFO:root:[Epoch 9] train=0.344211 val=0.349900 loss=1.757741 time: 94.065917

Is this fine?

Can you also test the optimizer op with a large sparse tensor? Currently, SGD, Adagrad, Adam, and FTRL support row_sparse weight and gradient.

ChaiBapchya · 2020-02-01T00:12:20Z

>>> import mxnet as mx
>>> from mxnet.test_utils import *
>>> w = rand_ndarray((2**32+1,1), 'row_sparse', density=1)
>>> mx.nd.adam_update(w,w,w,w,lr=0.1)
[00:00:47] ../src/executor/../operator/../common/utils.h:472: Optimizer with lazy_update = True detected. Be aware that lazy update with row_sparse gradient is different from standard update, and may lead to different empirical results. See https://mxnet.apache.org/api/python/optimization/optimization.html for more details.

<RowSparseNDArray 4294967297x1 @cpu(0)>
>>> a=mx.nd.adam_update(w,w,w,w,lr=0.1)
>>> a

<RowSparseNDArray 4294967297x1 @cpu(0)>

ChaiBapchya · 2020-02-04T00:08:08Z

@szhengac

Thanks for the help with passing sparse array
So this works as we discussed offline

import mxnet as mx
from mxnet.test_utils import *
w = rand_ndarray((2**32+1,1), 'row_sparse', density=1)
g = rand_ndarray((2**32+1,1), 'row_sparse', density=1)
m = mx.nd.zeros((2**32+1,1), stype='row_sparse')
v = mx.nd.zeros((2**32+1,1), stype='row_sparse')
ans=mx.nd.adam_update(w,g,m,v,lr=0.1)
ans.data.asnumpy()

Output

array([[ 0.19461787],
       [-0.3031752 ],
       [-0.18570909],
       ...,
       [        nan],
       [        nan],
       [        nan]], dtype=float32)

apeforest

LGTM

…tion (apache#17444) * fix hard sigmoid * change int i to index_t i for all Kernel Map functions * fix lint * size t indext fix

ChaiBapchya added 2 commits January 27, 2020 02:29

fix hard sigmoid

a16b42e

change int i to index_t i for all Kernel Map functions

93cc06e

ChaiBapchya requested a review from apeforest January 27, 2020 06:44

ChaiBapchya mentioned this pull request Jan 27, 2020

[Large Tensor] Implemented LT flag for OpPerf testing #17449

Merged

4 tasks

access2rohit reviewed Jan 28, 2020

View reviewed changes

src/operator/optimizer_op-inl.h Show resolved Hide resolved

ChaiBapchya mentioned this pull request Jan 30, 2020

[mxnet 2.0] [item 2.4] Turning on large tensor support by default #17331

Open

fix lint

0918e95

ChaiBapchya mentioned this pull request Jan 30, 2020

[Flaky] flaky test in test_operator_gpu.test_convolution_multiple_streams #14329

Open

ChaiBapchya mentioned this pull request Jan 30, 2020

Flaky test: test_operator_gpu.test_sequence_last causes 'CUDA: unspecified launch failure' #11395

Open

szhengac reviewed Jan 31, 2020

View reviewed changes

src/operator/optimizer_op-inl.h Outdated Show resolved Hide resolved

lanking520 added the pr-awaiting-review PR is waiting for code review label Jan 31, 2020

size t indext fix

887070a

szhengac approved these changes Feb 4, 2020

View reviewed changes

apeforest approved these changes Feb 4, 2020

View reviewed changes

apeforest merged commit b65db3c into apache:master Feb 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Large Tensor] Add LT support for NN optimizers and 1 activation function #17444

[Large Tensor] Add LT support for NN optimizers and 1 activation function #17444

ChaiBapchya commented Jan 27, 2020 •

edited

Loading

access2rohit commented Jan 28, 2020

access2rohit commented Jan 28, 2020 •

edited

Loading

szha commented Jan 30, 2020

szhengac commented Jan 31, 2020

ChaiBapchya commented Jan 31, 2020 •

edited

Loading

szhengac commented Jan 31, 2020

ChaiBapchya commented Jan 31, 2020

ChaiBapchya commented Jan 31, 2020

szhengac commented Jan 31, 2020 •

edited

Loading

ChaiBapchya commented Feb 1, 2020

ChaiBapchya commented Feb 4, 2020

apeforest left a comment

[Large Tensor] Add LT support for NN optimizers and 1 activation function #17444

[Large Tensor] Add LT support for NN optimizers and 1 activation function #17444

Conversation

ChaiBapchya commented Jan 27, 2020 • edited Loading

Description

Checklist

Essentials

Changes

Comments

access2rohit commented Jan 28, 2020

access2rohit commented Jan 28, 2020 • edited Loading

szha commented Jan 30, 2020

szhengac commented Jan 31, 2020

ChaiBapchya commented Jan 31, 2020 • edited Loading

szhengac commented Jan 31, 2020

ChaiBapchya commented Jan 31, 2020

ChaiBapchya commented Jan 31, 2020

szhengac commented Jan 31, 2020 • edited Loading

ChaiBapchya commented Feb 1, 2020

ChaiBapchya commented Feb 4, 2020

apeforest left a comment

Choose a reason for hiding this comment

ChaiBapchya commented Jan 27, 2020 •

edited

Loading

access2rohit commented Jan 28, 2020 •

edited

Loading

ChaiBapchya commented Jan 31, 2020 •

edited

Loading

szhengac commented Jan 31, 2020 •

edited

Loading