Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Manually check node existence in CachedOp #11545

Merged
merged 8 commits into from
Jul 5, 2018

Conversation

junrushao
Copy link
Member

@junrushao junrushao commented Jul 3, 2018

Description

Constructing a weird CachedOp directly in C++, in rare cases might lead to rebuilding full_graph in CachedOp without updating bwd_ograd_dep_. This potentially causes crash when OpReqType of one of outputs of grad_graph_ is OpReqType::kNullOp. This PR enhances the capability of CachedOp to handle this case.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Manually check node existence in CachedOp

Comments

  • This change is fully backward compatible.

@junrushao
Copy link
Member Author

@zheng-da @piiswrong Could you help review this PR? Thanks!

Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One nit would be to give a name for the placeholder value to improve readability.

@junrushao
Copy link
Member Author

junrushao commented Jul 4, 2018

@szha I will change std::numeric_limits<uint32_t>::max() to kEidNotExist

Update: done

@zheng-da
Copy link
Contributor

zheng-da commented Jul 4, 2018

@junrushao1994 Please write a unit test that fails without this fix.

@junrushao
Copy link
Member Author

@zheng-da Thank you for the tips! I added the testcase just now.

@szha szha merged commit 19ac41d into apache:master Jul 5, 2018
@junrushao junrushao deleted the cached-op-enhance branch July 27, 2018 01:15
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
* Manually check node existence in CachedOp

* Fix lint

* Trigger CI

* Improve readability, replace `numeric_limits::max` with `kEidNotExist`

* Add testcase

* Trigger CI

* Remove commented lines in unittests

* Trigger CI
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants