Fix amalgamation failure. #15303

adis300 · 2019-06-21T03:42:20Z

Referring to issue #14808

amalgamation predict api fails when loading a network. The error messages are attached.

mxnet_predict-all.cc:32818: Check failed: reg != nullptr: Cannot find pass MXPlanMemory in the registry

However, changing MXPlanMemory to PlanMemory would not resolve the issue. Because amalgamation uses tvm/nnvm. and mxnet core engine uses its own customized nnvm version.
Simply using tvm/nnvm/plan_memory would cause Check failure.

mxnet_predict-all.cc:4289: Check failed: strcmp(type_->ptype_info->name(), typeid(T).name()) == 0: The stored type name mismatch stored=NSt3__16vectorIN5mxnet6TShapeENS_9allocatorIS2_EEEE requested=NSt3__16vectorIN4nnvm6TShapeENS_9allocatorIS2_EEEE

Therefore I added mxnet/nnvm to the C predict API amalgamation list.
And refactored some names to avoid redeclaration.

adis300 · 2019-06-21T07:47:08Z

Solution to the problem:
add. #include "src/nnvm/plan_memory.cc" to mxnet_predict0.cc
Refactor PlanMemory -> MXPlanMemory in mxnet/src/nnvm
Refactor FindBestPath -> MXFindBestPath in mxnet/src/nnvm
Refactor ColorNodeGroup -> MXColorNodeGroup in mxnet/src/nnvm
Refactor GetDTypeSize -> MXGetDTypeSize in mxnet/src/nnvm
Refactor GraphAllocator -> MXGraphAllocator in mxnet/src/nnvm
Refactor AllocMemory -> MXAllocMemory in mxnet/src/nnvm

apache/mxnet#15303

adis300 · 2019-07-02T17:05:51Z

@Roshrini @pengzhao-intel Please review this change as soon as possible. This is an important fix.

pengzhao-intel · 2019-07-02T22:54:34Z

@ZhennanQin @TaoLv to help review this change :)

TaoLv · 2019-07-03T14:46:54Z

Can you elaborate more? What's the mxnet customized nnvm version?

Because amalgamation uses tvm/nnvm. and mxnet core engine uses its own customized nnvm version.

Also, do you think we need unit test to cover this issue?

adis300 · 2019-07-03T15:33:27Z

@TaoLv I think amalgamation does need a lot of unit testing. Because it is basically broken right now. We did a lot of bash script to modify the codes for building on each platform. There is a ${mxnet_root}/src/nnvm which is used by the predict API and MXNet core. When building amalgamation this customized version is not included, ${mxnet_root}/3rdparty/tvm/nnvm is used instead. However,3rdparty/tvm/nnvm does not work when running any test.

So I have to include ${mxnet_root}/src/nnvm and avoid conflicts. In previous edits PlanMemory is already changed to MXPlanMemory. I followed the same convention to change other variable names.

In the future, I think there should be only one nnvm dependency in 3rd party directory. This is the ultimate solution.

My solution is temporary that works.

adis300 · 2019-07-03T15:36:50Z

To reproduce the error. Simply load a model and perform forward prediction on Mac with amalgamation static/shared build. Basically, it doesn't work without my fixes and I don't think there is any test cases to cover amalgamation predictions on mobile and desktops. @TaoLv

adis300 · 2019-07-03T15:53:10Z

@TaoLv @pengzhao-intel Check out our work at BrainCoTech/mxnet-prebuild. It's still at early stage but this allows us to run a mxnet model in the same code on multiple platforms. We are working on automated Windows desktop builds now.

TaoLv · 2019-07-03T16:00:21Z

Thank you for the explanation and your project looks awesome! @adis300

@szha can you help to review the change? I'm not familiar with the amalgamation issue. But overall the changes for function name look good to me. Not sure if there are any backward compatibility concern.

TaoLv · 2019-07-03T16:01:37Z

Just a side question, have you ever tried MXNet MKL-DNN build in your project? @adis300

adis300 · 2019-07-03T16:10:19Z

@TaoLv We just tried a general mxnet build with make and most works are on amalgamation without MKL-DNN. Because we want to run our python generated models on all desktops/mobile platforms with the same C code. We are working on Windows amalgamation builds now and will try MKL-DNN very soon.

adis300 · 2019-07-08T23:25:19Z

@eric-haibin-lin Can you please take a look at this PR. It is an important bug fix and has been there for half a month. The current master branch basically doesn't run c_predict_api.h forward pass.

karan6181 · 2019-07-19T01:06:27Z

@eric-haibin-lin @TaoLv Could you please review this PR? Thanks!

adis300 · 2019-08-01T14:45:45Z

@TaoLv Any updates?

TaoLv · 2019-08-01T15:22:10Z

As I said here: #15303 (comment), the changes look good me. But I still think you need get approval from more experienced committer. @szha @eric-haibin-lin @wkcn Do you have any idea?

PS: Could you please rebase the code? I can see the last run of CI is more than one month ago. I can approve if it passes.

adis300 · 2019-08-01T18:08:12Z

@TaoLv @eric-haibin-lin @szha All checks passed after rebase onto latest master.

adis300 · 2019-08-26T22:03:34Z

@TaoLv It's been a month since this branch is rebased. I am pretty sure this feature branch fixes amalgamation failure and there is currently not test case that covers this part. Please merge it so that I don't have to constantly rebasing this feature to use in our product. We have to do custom builds for for multiple platforms whenever we want to update mxnet, which is very painful.

cjolivier01

LGTM
If we intend to keep this feature (matters not to me either way), we should at least consider having a test that runs a simple mnist train or inference (if it doesn't already exist).

marcoabreu · 2019-09-12T18:58:42Z

I think we're still running Amalagamation tests: https://github.com/apache/incubator-mxnet/blob/master/tests/nightly/Jenkinsfile#L76

marcoabreu · 2019-09-12T19:00:22Z

Hey @TaoLv since we have an approval, could you assist the author to bring this PR to completion and assist them with CI and other questions?

adis300 · 2019-09-12T22:00:58Z

Thanks @marcoabreu .

@TaoLv I will try to rebase this branch and test in the next few days. Since it is very behind now.

Hope the merge request can be handled responsively after rebasing because rebasing. I have rebased it a few times already and it takes some decent amount of effort.

… the original tvm/nnvm.

adis300 · 2019-11-22T06:09:50Z

@marcoabreu @TaoLv I have just rebased the feature onto the latest master branch and resolved related conflicts.

wkcn

LGTM. Thanks for your contribution!

Roshrini added the pr-awaiting-review PR is waiting for code review label Jun 23, 2019

zlargon added a commit to BrainCoTech/mxnet-prebuild that referenced this pull request Jul 1, 2019

[mxnet] use customized branch which fixes the bug in mxnet

6a55501

apache/mxnet#15303

TaoLv requested a review from piiswrong July 3, 2019 14:47

cjolivier01 approved these changes Sep 12, 2019

View reviewed changes

Refactoring names for mxnet version of nnvm to avoid conflicting with…

0cdaaa9

… the original tvm/nnvm.

wkcn approved these changes Nov 22, 2019

View reviewed changes

marcoabreu merged commit 20f8bbc into apache:master Nov 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix amalgamation failure. #15303

Fix amalgamation failure. #15303

adis300 commented Jun 21, 2019

adis300 commented Jun 21, 2019

adis300 commented Jul 2, 2019

pengzhao-intel commented Jul 2, 2019

TaoLv commented Jul 3, 2019 •

edited

Loading

adis300 commented Jul 3, 2019

adis300 commented Jul 3, 2019 •

edited

Loading

adis300 commented Jul 3, 2019 •

edited

Loading

TaoLv commented Jul 3, 2019

TaoLv commented Jul 3, 2019

adis300 commented Jul 3, 2019

adis300 commented Jul 8, 2019

karan6181 commented Jul 19, 2019

adis300 commented Aug 1, 2019

TaoLv commented Aug 1, 2019

adis300 commented Aug 1, 2019 •

edited

Loading

adis300 commented Aug 26, 2019

cjolivier01 left a comment

marcoabreu commented Sep 12, 2019

marcoabreu commented Sep 12, 2019

adis300 commented Sep 12, 2019

adis300 commented Nov 22, 2019

wkcn left a comment

Fix amalgamation failure. #15303

Fix amalgamation failure. #15303

Conversation

adis300 commented Jun 21, 2019

adis300 commented Jun 21, 2019

adis300 commented Jul 2, 2019

pengzhao-intel commented Jul 2, 2019

TaoLv commented Jul 3, 2019 • edited Loading

adis300 commented Jul 3, 2019

adis300 commented Jul 3, 2019 • edited Loading

adis300 commented Jul 3, 2019 • edited Loading

TaoLv commented Jul 3, 2019

TaoLv commented Jul 3, 2019

adis300 commented Jul 3, 2019

adis300 commented Jul 8, 2019

karan6181 commented Jul 19, 2019

adis300 commented Aug 1, 2019

TaoLv commented Aug 1, 2019

adis300 commented Aug 1, 2019 • edited Loading

adis300 commented Aug 26, 2019

cjolivier01 left a comment

Choose a reason for hiding this comment

marcoabreu commented Sep 12, 2019

marcoabreu commented Sep 12, 2019

adis300 commented Sep 12, 2019

adis300 commented Nov 22, 2019

wkcn left a comment

Choose a reason for hiding this comment

TaoLv commented Jul 3, 2019 •

edited

Loading

adis300 commented Jul 3, 2019 •

edited

Loading

adis300 commented Jul 3, 2019 •

edited

Loading

adis300 commented Aug 1, 2019 •

edited

Loading