Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in `python': malloc(): memory corruption: 0x00007f540c0a6190 #529

Closed
ghost opened this issue Dec 20, 2018 · 5 comments
Closed

Error in `python': malloc(): memory corruption: 0x00007f540c0a6190 #529

ghost opened this issue Dec 20, 2018 · 5 comments
Labels

Comments

@ghost
Copy link

ghost commented Dec 20, 2018

ENV:

    Centos 6.6(Final)
    CUDA: 9.0.176
    Python: 2.7.5
    cudnn: 7 
    mxnet: 1.3.0-cu90
    gluon-cv: 0.3.0

The MXNET_ENGINE_TYPE =NaiveEngine Trace Message:

INFO:root:Start training from [Epoch 0]
*** Error in `python': malloc(): memory corruption: 0x00007f540c0a6190 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x82c86)[0x7f54d42dec86]
/lib64/libc.so.6(__libc_malloc+0x4c)[0x7f54d42e184c]
/lib64/libstdc++.so.6(_Znwm+0x1d)[0x7f54b48c7ecd]
/data/workspace/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet2op9SortByKeyIifEEvN7mshadow6TensorINS2_3cpuELi1ET_EENS3_IS4_Li1ET0_EEbPNS3_IS4_Li1EcEEii+0x3b0)[0x7f546654cb90]/data/workspace/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet2op24BipartiteMatchingForwardIN7mshadow3cpuEEEvRKN4nnvm9NodeAttrsERKNS_9OpContextERKSt6vectorINS_5TBlobESaISC_EERKSB_INS_9OpReqTypeESaISH_EESG_+0x129f)[0x7f546655229f]
/data/workspace/mxnet/python/mxnet/../../lib/libmxnet.so(_ZZN5mxnet10imperative12PushFComputeERKSt8functionIFvRKN4nnvm9NodeAttrsERKNS_9OpContextERKSt6vectorINS_5TBlobESaISA_EERKS9_INS_9OpReqTypeESaISF_EESE_EEPKNS2_2OpES5_RKNS_7ContextERKS9_IPNS_6engine3VarESaISW_EES10_RKS9_INS_8ResourceESaIS11_EERKS9_IPNS_7NDArrayESaIS17_EES1B_RKS9_IjSaIjEESJ_ENKUlNS_10RunContextEE_clES1G_+0x2e8)[0x7f5468c8b368]
*** Error in `python': malloc(): memory corruption: 0x00007f540c1a9550 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x82c86)[0x7f54d42dec86]
/lib64/libc.so.6(__libc_malloc+0x4c)[0x7f54d42e184c]
/lib64/libstdc++.so.6(_Znwm+0x1d)[0x7f54b48c7ecd]
/data/workspace/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3c042d9)[0x7f54691982d9]
/data/workspace/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x589)[0x7f5469194249]
*** Error in `python': corrupted size vs. prev_size: 0x00007f540c08c830 ***
======= Backtrace: =========
/data/workspace/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet2op24BipartiteMatchingForwardIN7mshadow3cpuEEEvRKN4nnvm9NodeAttrsERKNS_9OpContextERKSt6vectorINS_5TBlobESaISC_EERKSB_INS_9OpReqTypeESaISH_EESG_+0x129f)[0x7f546655229f]
/data/workspace/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvSt10shared_ptrIN4dmlc11ManualEventEEEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS6_8OprBlockEbENKUlvE_clEvEUlS3_E_E9_M_invokeERKSt9_Any_dataS3_+0xd2)[0x7f54691a4df2]
/lib64/libc.so.6(+0x7f5e4)[0x7f54d42db5e4]
/data/workspace/mxnet/python/mxnet/../../lib/libmxnet.so(_ZZN5mxnet10imperative12PushFComputeERKSt8functionIFvRKN4nnvm9NodeAttrsERKNS_9OpContextERKSt6vectorINS_5TBlobESaISA_EERKS9_INS_9OpReqTypeESaISF_EESE_EEPKNS2_2OpES5_RKNS_7ContextERKS9_IPNS_6engine3VarESaISW_EES10_RKS9_INS_8ResourceESaIS11_EERKS9_IPNS_7NDArrayESaIS17_EES1B_RKS9_IjSaIjEESJ_ENKUlNS_10RunContextEE_clES1G_+0x2e8)[0x7f5468c8b368]
/data/workspace/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN4dmlc11ManualEventEEEES6_EEE6_M_runEv+0x44)[0x7f5469193b94]
/lib64/libc.so.6(+0x816db)[0x7f54d42dd6db]
/lib64/libstdc++.so.6(+0xb5070)[0x7f54b491e070]
/data/workspace/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3c042d9)[0x7f54691982d9]
/data/workspace/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet11StorageImpl4FreeENS_7Storage6HandleE+0x77)[0x7f54691af017]
/lib64/libc.so.6(clone+0x6d)[0x7f54d435abad]
......

The full trach message is here
error.log

@ghost ghost closed this as completed Dec 22, 2018
@ghost ghost reopened this Dec 22, 2018
@yudie433
Copy link

Can you solve this problem?

@arcadiaphy
Copy link
Member

arcadiaphy commented Dec 25, 2018

@yudie433 see apache/mxnet#13727

@Leon924
Copy link

Leon924 commented Apr 26, 2019

@arcadiaphy why did I still get this error when I was trying to use: python train_ssd.py?
(tvm36) [liqiang@inspur ssd]$ python train_ssd.py
[11:00:12] src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate 147456 bytes with malloc directly
[11:00:12] src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate 589824 bytes with malloc directly
[11:00:12] src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate 2359296 bytes with malloc directly
[11:00:12] src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate 9437184 bytes with malloc directly
INFO:root:Namespace(batch_size=16, data_shape=512, dataset='voc', epochs=240, gpus='1,2', log_interval=100, lr=0.001, lr_decay=0.1, lr_decay_epoch='160,200', momentum=0.9, network='resnet50_v1', num_workers=4, resume='', save_interval=10, save_prefix='ssd_512_resnet50_v1_voc', seed=233, start_epoch=0, syncbn=False, val_interval=1, wd=0.0005)
INFO:root:Start training from [Epoch 0]
python: malloc.c:2365: sysmalloc: Assertion (old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed. *** Error in python': malloc(): memory corruption: 0x00007ff4a006df30 ***
python: malloc.c:3609: _int_malloc: Assertion (unsigned long)(size) >= (unsigned long)(nb)' failed. *** Error in python': malloc(): memory corruption: 0x00007ff4a0747320 ***

@arcadiaphy
Copy link
Member

@petit-ami Paste all the backtrace and provide more information on how to reproduce this error.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants