Re-Enabling Large Tensor and Vector Nightly on GPU #16164

access2rohit · 2019-09-13T17:47:53Z

Description

Reverts PR: #15141. Since the fix: #17450 for issue #14981 has been merged
To be merged only after nightly tests are restored. This test has been re-enabled since PRs have been merged that have significantly reduced memory footprint of ops like topk, argsort and sort from over 400GB to around 220GB on Large Tensor tests.

Also adding large vector nightly

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Tests

ci/build.py --docker-registry mxnetci --nvidiadocker --platform ubuntu_nightly_gpu --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh nightly_test_large_tensor

2020-01-28 01:08:03,925 - root - INFO - Started container: 5bceca2b8f26
+ NOSE_COVERAGE_ARGUMENTS='--with-coverage --cover-inclusive --cover-xml --cover-branches --cover-package=mxnet'
+ NOSE_TIMER_ARGUMENTS='--with-timer --timer-ok 1 --timer-warning 15 --timer-filter warning,error'
+ CI_CUDA_COMPUTE_CAPABILITIES='-gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_70,code=sm_70'
+ CI_CMAKE_CUDA_ARCH='5.2 7.0'
+ set +x
+ export PYTHONPATH=./python/
+ PYTHONPATH=./python/
+ export DMLC_LOG_STACK_TRACE_DEPTH=10
+ DMLC_LOG_STACK_TRACE_DEPTH=10
+ nosetests-3.4 tests/nightly/test_large_array.py:test_tensor
S
----------------------------------------------------------------------
Ran 1 test in 125.654s

OK (SKIP=1)
+ nosetests-3.4 tests/nightly/test_large_array.py:test_nn
[01:15:48] src/executor/graph_executor.cc:2062: Subgraph backend MKLDNN is activated.
[01:15:52] src/executor/graph_executor.cc:2062: Subgraph backend MKLDNN is activated.
S
----------------------------------------------------------------------
Ran 1 test in 344.892s

OK (SKIP=1)
+ nosetests-3.4 tests/nightly/test_large_array.py:test_basic
S
----------------------------------------------------------------------
Ran 1 test in 156.411s

OK (SKIP=1)

ci/build.py --docker-registry mxnetci --nvidiadocker --platform ubuntu_nightly_gpu --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh nightly_test_large_vector

+ NOSE_TIMER_ARGUMENTS='--with-timer --timer-ok 1 --timer-warning 15 --timer-filter warning,error'
+ CI_CUDA_COMPUTE_CAPABILITIES='-gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_70,code=sm_70'
+ CI_CMAKE_CUDA_ARCH='5.2 7.0'
+ set +x
+ export PYTHONPATH=./python/
+ PYTHONPATH=./python/
+ export DMLC_LOG_STACK_TRACE_DEPTH=10
+ DMLC_LOG_STACK_TRACE_DEPTH=10
+ nosetests-3.4 tests/nightly/test_large_vector.py:test_tensor
S
----------------------------------------------------------------------
Ran 1 test in 107.439s

OK (SKIP=1)
+ nosetests-3.4 tests/nightly/test_large_vector.py:test_nn
[06:22:40] src/executor/graph_executor.cc:1982: Subgraph backend MKLDNN is activated.
[06:27:34] src/executor/graph_executor.cc:1982: Subgraph backend MKLDNN is activated.
.
----------------------------------------------------------------------
Ran 1 test in 653.536s

OK
+ nosetests-3.4 tests/nightly/test_large_vector.py:test_basic
S
----------------------------------------------------------------------
Ran 1 test in 65.930s

OK (SKIP=1)

access2rohit · 2019-09-13T17:50:10Z

@mxnet-label-bot add [pr-awaiting-review]

access2rohit · 2019-09-13T17:50:52Z

@apeforest can you review ?

apeforest · 2019-11-25T18:32:58Z

Can you run all the tests in the same container as nightly and paste the results here? Thanks!

access2rohit · 2020-02-06T01:38:29Z

@mxnet-label-bot add [pr-awaiting-review]

access2rohit · 2020-02-06T01:38:52Z

@apeforest can you take a look ?

access2rohit · 2020-02-06T17:51:28Z

@mxnet-label-bot update [pr-awaiting-merge]

lanking520 added the pr-awaiting-review PR is waiting for code review label Sep 16, 2019

Vikas-kum mentioned this pull request Sep 16, 2019

Tutorials nighly fix #16179

Merged

7 tasks

access2rohit force-pushed the re-enable_large_tensor branch 2 times, most recently from bd2105a to ac6cb59 Compare January 14, 2020 19:28

access2rohit changed the title ~~Re-Enabling Large Tensor Nightly on GPU~~ Re-Enabling Large Tensor and Vector Nightly on GPU Jan 14, 2020

access2rohit force-pushed the re-enable_large_tensor branch from ac6cb59 to f75aad8 Compare January 27, 2020 18:26

apeforest mentioned this pull request Jan 29, 2020

[mxnet 2.0] [item 2.4] Turning on large tensor support by default #17331

Open

access2rohit force-pushed the re-enable_large_tensor branch 3 times, most recently from 4e871db to e004b56 Compare February 5, 2020 19:34

Re-Enabling Large Tensor Nightly on GPU

916212f

access2rohit force-pushed the re-enable_large_tensor branch from e004b56 to 916212f Compare February 5, 2020 21:36

access2rohit requested a review from apeforest February 5, 2020 21:37

lanking520 added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review labels Feb 6, 2020

apeforest approved these changes Feb 6, 2020

View reviewed changes

apeforest merged commit f850170 into apache:master Feb 6, 2020

zheyuye pushed a commit to zheyuye/incubator-mxnet that referenced this pull request Feb 19, 2020

Re-Enabling Large Tensor Nightly on GPU (apache#16164)

107fde4

anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 29, 2020

Re-Enabling Large Tensor Nightly on GPU (apache#16164)

a9476e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-Enabling Large Tensor and Vector Nightly on GPU #16164

Re-Enabling Large Tensor and Vector Nightly on GPU #16164

access2rohit commented Sep 13, 2019 •

edited

Loading

access2rohit commented Sep 13, 2019

access2rohit commented Sep 13, 2019

apeforest commented Nov 25, 2019

access2rohit commented Feb 6, 2020

access2rohit commented Feb 6, 2020

access2rohit commented Feb 6, 2020

Re-Enabling Large Tensor and Vector Nightly on GPU #16164

Re-Enabling Large Tensor and Vector Nightly on GPU #16164

Conversation

access2rohit commented Sep 13, 2019 • edited Loading

Description

Checklist

Essentials

Tests

access2rohit commented Sep 13, 2019

access2rohit commented Sep 13, 2019

apeforest commented Nov 25, 2019

access2rohit commented Feb 6, 2020

access2rohit commented Feb 6, 2020

access2rohit commented Feb 6, 2020

access2rohit commented Sep 13, 2019 •

edited

Loading