Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

mshadow: fix vector access #17021

Merged
merged 1 commit into from
Dec 11, 2019
Merged

mshadow: fix vector access #17021

merged 1 commit into from
Dec 11, 2019

Conversation

TaoLv
Copy link
Member

@TaoLv TaoLv commented Dec 9, 2019

Description

When MXNet is build with USE_BLAS=mkl and DEBUG=1, calling batch_dot will crash on vector size check. This PR should fix the problem.

Use below code snippet to reproduce:

import mxnet as mx

x = mx.nd.uniform(shape=(128, 768, 64))
y = mx.nd.uniform(shape=(128, 768, 64))
z = mx.nd.batch_dot(x, y, transpose_b=True)
print(z)

Error message as below:

$ python test_batch_dot.py
/opt/rh/devtoolset-7/root/usr/include/c++/7/bits/stl_vector.h:797: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = const float*; _Alloc = std::allocator<const float*>; std::vector<_Tp, _Alloc>::reference = const float*&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed.
Aborted (core dumped)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@marcoabreu
Copy link
Contributor

Thanks! Would you mind adding a test with the code you used to reproduce?

@TaoLv
Copy link
Member Author

TaoLv commented Dec 9, 2019

@marcoabreu We already have test_batch_dot in test_operator.py. The problem is the package in CI is built with DEV=1 rather than DEBUG=1 [1] but _GLIBCXX_ASSERTIONS is only enabled when DEBUG=1 [2].

[1] https://github.com/apache/incubator-mxnet/blob/master/ci/docker/runtime_functions.sh#L684
[2] https://github.com/apache/incubator-mxnet/blob/master/Makefile#L105

@codecov-io
Copy link

Codecov Report

Merging #17021 into master will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17021      +/-   ##
==========================================
+ Coverage   67.27%   67.29%   +0.01%     
==========================================
  Files         271      271              
  Lines       30461    30461              
  Branches     4534     4534              
==========================================
+ Hits        20493    20499       +6     
+ Misses       8666     8658       -8     
- Partials     1302     1304       +2
Impacted Files Coverage Δ
python/mxnet/symbol/_internal.py 92% <0%> (+12%) ⬆️
python/mxnet/ndarray/_internal.py 91.3% <0%> (+13.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44cd63e...7ce1b6e. Read the comment docs.

@szha szha merged commit 9634786 into apache:master Dec 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants