Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix amalgamation failure. #15303

Merged
merged 1 commit into from
Nov 23, 2019
Merged

Fix amalgamation failure. #15303

merged 1 commit into from
Nov 23, 2019

Conversation

adis300
Copy link
Contributor

@adis300 adis300 commented Jun 21, 2019

Referring to issue #14808

amalgamation predict api fails when loading a network. The error messages are attached.

mxnet_predict-all.cc:32818: Check failed: reg != nullptr: Cannot find pass MXPlanMemory in the registry

However, changing MXPlanMemory to PlanMemory would not resolve the issue. Because amalgamation uses tvm/nnvm. and mxnet core engine uses its own customized nnvm version.
Simply using tvm/nnvm/plan_memory would cause Check failure.

mxnet_predict-all.cc:4289: Check failed: strcmp(type_->ptype_info->name(), typeid(T).name()) == 0: The stored type name mismatch stored=NSt3__16vectorIN5mxnet6TShapeENS_9allocatorIS2_EEEE requested=NSt3__16vectorIN4nnvm6TShapeENS_9allocatorIS2_EEEE

Therefore I added mxnet/nnvm to the C predict API amalgamation list.
And refactored some names to avoid redeclaration.

@adis300
Copy link
Contributor Author

adis300 commented Jun 21, 2019

Solution to the problem:
add. #include "src/nnvm/plan_memory.cc" to mxnet_predict0.cc
Refactor PlanMemory -> MXPlanMemory in mxnet/src/nnvm
Refactor FindBestPath -> MXFindBestPath in mxnet/src/nnvm
Refactor ColorNodeGroup -> MXColorNodeGroup in mxnet/src/nnvm
Refactor GetDTypeSize -> MXGetDTypeSize in mxnet/src/nnvm
Refactor GraphAllocator -> MXGraphAllocator in mxnet/src/nnvm
Refactor AllocMemory -> MXAllocMemory in mxnet/src/nnvm

@Roshrini Roshrini added the pr-awaiting-review PR is waiting for code review label Jun 23, 2019
zlargon added a commit to BrainCoTech/mxnet-prebuild that referenced this pull request Jul 1, 2019
@adis300
Copy link
Contributor Author

adis300 commented Jul 2, 2019

@Roshrini @pengzhao-intel Please review this change as soon as possible. This is an important fix.

@pengzhao-intel
Copy link
Contributor

@ZhennanQin @TaoLv to help review this change :)

@TaoLv
Copy link
Member

TaoLv commented Jul 3, 2019

Can you elaborate more? What's the mxnet customized nnvm version?

Because amalgamation uses tvm/nnvm. and mxnet core engine uses its own customized nnvm version.

Also, do you think we need unit test to cover this issue?

@TaoLv TaoLv requested a review from piiswrong July 3, 2019 14:47
@adis300
Copy link
Contributor Author

adis300 commented Jul 3, 2019

@TaoLv I think amalgamation does need a lot of unit testing. Because it is basically broken right now. We did a lot of bash script to modify the codes for building on each platform. There is a ${mxnet_root}/src/nnvm which is used by the predict API and MXNet core. When building amalgamation this customized version is not included, ${mxnet_root}/3rdparty/tvm/nnvm is used instead. However,3rdparty/tvm/nnvm does not work when running any test.

So I have to include ${mxnet_root}/src/nnvm and avoid conflicts. In previous edits PlanMemory is already changed to MXPlanMemory. I followed the same convention to change other variable names.

In the future, I think there should be only one nnvm dependency in 3rd party directory. This is the ultimate solution.

My solution is temporary that works.

@adis300
Copy link
Contributor Author

adis300 commented Jul 3, 2019

To reproduce the error. Simply load a model and perform forward prediction on Mac with amalgamation static/shared build. Basically, it doesn't work without my fixes and I don't think there is any test cases to cover amalgamation predictions on mobile and desktops. @TaoLv

@adis300
Copy link
Contributor Author

adis300 commented Jul 3, 2019

@TaoLv @pengzhao-intel Check out our work at BrainCoTech/mxnet-prebuild. It's still at early stage but this allows us to run a mxnet model in the same code on multiple platforms. We are working on automated Windows desktop builds now.

@TaoLv
Copy link
Member

TaoLv commented Jul 3, 2019

Thank you for the explanation and your project looks awesome! @adis300

@szha can you help to review the change? I'm not familiar with the amalgamation issue. But overall the changes for function name look good to me. Not sure if there are any backward compatibility concern.

@TaoLv
Copy link
Member

TaoLv commented Jul 3, 2019

Just a side question, have you ever tried MXNet MKL-DNN build in your project? @adis300

@adis300
Copy link
Contributor Author

adis300 commented Jul 3, 2019

@TaoLv We just tried a general mxnet build with make and most works are on amalgamation without MKL-DNN. Because we want to run our python generated models on all desktops/mobile platforms with the same C code. We are working on Windows amalgamation builds now and will try MKL-DNN very soon.

@adis300
Copy link
Contributor Author

adis300 commented Jul 8, 2019

@eric-haibin-lin Can you please take a look at this PR. It is an important bug fix and has been there for half a month. The current master branch basically doesn't run c_predict_api.h forward pass.

@karan6181
Copy link
Contributor

@eric-haibin-lin @TaoLv Could you please review this PR? Thanks!

@adis300
Copy link
Contributor Author

adis300 commented Aug 1, 2019

@TaoLv Any updates?

@TaoLv
Copy link
Member

TaoLv commented Aug 1, 2019

As I said here: #15303 (comment), the changes look good me. But I still think you need get approval from more experienced committer. @szha @eric-haibin-lin @wkcn Do you have any idea?

PS: Could you please rebase the code? I can see the last run of CI is more than one month ago. I can approve if it passes.

@adis300
Copy link
Contributor Author

adis300 commented Aug 1, 2019

@TaoLv @eric-haibin-lin @szha All checks passed after rebase onto latest master.

@adis300
Copy link
Contributor Author

adis300 commented Aug 26, 2019

@TaoLv It's been a month since this branch is rebased. I am pretty sure this feature branch fixes amalgamation failure and there is currently not test case that covers this part. Please merge it so that I don't have to constantly rebasing this feature to use in our product. We have to do custom builds for for multiple platforms whenever we want to update mxnet, which is very painful.

Copy link
Member

@cjolivier01 cjolivier01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
If we intend to keep this feature (matters not to me either way), we should at least consider having a test that runs a simple mnist train or inference (if it doesn't already exist).

@marcoabreu
Copy link
Contributor

I think we're still running Amalagamation tests: https://github.com/apache/incubator-mxnet/blob/master/tests/nightly/Jenkinsfile#L76

@marcoabreu
Copy link
Contributor

Hey @TaoLv since we have an approval, could you assist the author to bring this PR to completion and assist them with CI and other questions?

@adis300
Copy link
Contributor Author

adis300 commented Sep 12, 2019

Thanks @marcoabreu .

@TaoLv I will try to rebase this branch and test in the next few days. Since it is very behind now.

Hope the merge request can be handled responsively after rebasing because rebasing. I have rebased it a few times already and it takes some decent amount of effort.

@adis300
Copy link
Contributor Author

adis300 commented Nov 22, 2019

@marcoabreu @TaoLv I have just rebased the feature onto the latest master branch and resolved related conflicts.

Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your contribution!

@marcoabreu marcoabreu merged commit 20f8bbc into apache:master Nov 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants