Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Reorder module import orders for dist-kvstore #13742

Merged
merged 2 commits into from
Dec 29, 2018

Conversation

eric-haibin-lin
Copy link
Member

@eric-haibin-lin eric-haibin-lin commented Dec 28, 2018

Description

For distributed training, a new process is launched when importing the kvstore_server module. This should be imported at last so that other MXNet modules are initialized correctly. Otherwise this may result in error when unpickling custom LR scheduler/optimizers. For example, the LRScheduler in gluoncv https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/lr_scheduler.py#L8 depends on a specific version of MXNet, and checks the __version__ attr of MXNet, which is not set on kvstore server due to the fact that kvstore-server module is imported before the __version__ attr is set.

@hetong007

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

End to end tested with a gluon-cv example.

@sandeep-krishnamurthy sandeep-krishnamurthy added Gluon pr-awaiting-merge Review and CI is complete. Ready to Merge labels Dec 29, 2018
@@ -82,3 +81,7 @@
from . import gluon

__version__ = base.__version__

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, tricky issue. Nice fix. I think add more details as code comment taking from your PR description will be useful.
Otherwise this may result in error when unpickling custom LR scheduler/optimizers. For example, the LRScheduler in gluoncv https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/lr_scheduler.py#L8 depends on a specific version of MXNet, and checks the __version__ attr of MXNet, which is not set on kvstore server due to the fact that kvstore-server module is imported before the __version__ attr is set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Thanks

@eric-haibin-lin eric-haibin-lin merged commit 812b06a into apache:master Dec 29, 2018
rondogency pushed a commit to rondogency/incubator-mxnet that referenced this pull request Jan 9, 2019
* Reorder module import orders for dist-kvstore

* more code comments
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* Reorder module import orders for dist-kvstore

* more code comments
@eric-haibin-lin eric-haibin-lin deleted the kvstore-server branch December 14, 2019 05:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Gluon pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants