Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix for handling negative indices in the fusion of slice #17937

Merged
merged 3 commits into from
Apr 11, 2020

Conversation

ptrendx
Copy link
Member

@ptrendx ptrendx commented Mar 30, 2020

Description

Fixes #17914

@leezu @vafl

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Handling of the negative axis in the build_tuple helper function and added tests

@mxnet-bot
Copy link

Hey @ptrendx , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [miscellaneous, website, centos-cpu, unix-cpu, windows-cpu, unix-gpu, centos-gpu, windows-gpu, sanity, edge, clang]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@leezu
Copy link
Contributor

leezu commented Mar 30, 2020

Thanks @ptrendx. Do you plan to backport this change to 1.6 and perhaps have a 1.6.1 release?

@ptrendx
Copy link
Member Author

ptrendx commented Mar 30, 2020

@leezu Having a patch release would require following a release process again and my understanding is that there is going to be 1.7 release soon. @TaoLv I believe you guys wanted to do 1.7, right? Could you comment on the timeline?

@ptrendx
Copy link
Member Author

ptrendx commented Mar 30, 2020

unix-gpu and unix-cpu failed with gcc segfaulting - @ChaiBapchya have you seen this before?

@ChaiBapchya
Copy link
Contributor

Nope.. does it have something to do with your change?

@ptrendx
Copy link
Member Author

ptrendx commented Mar 30, 2020

I don't believe so - my change does not even touch files that are compiled in CPU only build.

@ptrendx
Copy link
Member Author

ptrendx commented Mar 30, 2020

Let's test the bot :-)

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@TaoLv
Copy link
Member

TaoLv commented Mar 31, 2020

@leezu Having a patch release would require following a release process again and my understanding is that there is going to be 1.7 release soon. @TaoLv I believe you guys wanted to do 1.7, right? Could you comment on the timeline?

@ptrendx Thank you for the ping. @ciyongch will help to manage the release of 1.7 and will send out the plan and timeline to dev@ soon.

@ciyongch
Copy link
Contributor

ciyongch commented Apr 1, 2020

@ptrendx It's really a great job to complete the journey of MXNet 1.6.0 release! I'm going to manage the release of 1.7 with the support from @TaoLv, and will update the plan later in dev.

@ptrendx
Copy link
Member Author

ptrendx commented Apr 4, 2020

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@ptrendx
Copy link
Member Author

ptrendx commented Apr 6, 2020

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@ptrendx
Copy link
Member Author

ptrendx commented Apr 7, 2020

@szha @ChaiBapchya Is anybody looking at this hang after e.g. test_np_empty tests? It would be really useful for the CI admins to hop on the node as it hangs and at least get a stacktrace or something to start figuring out where exactly the failure happens. From my experience those hangs are pretty common.

@ChaiBapchya
Copy link
Contributor

ChaiBapchya commented Apr 7, 2020

Yes. Getting a stacktrace as it hangs needs to be done before the instance is shut down by the autoscaler [as it is not feasible to disable autoscaler]. Right now only 1 unix-gpu instance is running and the python3 tests have passed. So we really have to time it such that the gpu test fail and the node is alive so that we get the stacktrace.

@leezu found that this might be operator related bug that can be tested locally.
To check the hypothesis, I am trying out testing these files using nosetest in following order:

test_contrib_amp test_deferred_compute_gpu test_device test_extensions_gpu test_forward test_fusion test_gluon_gpu test_operator_gpu

If I can reproduce it locally then we can ID the issue with the operator test.

@leezu
Copy link
Contributor

leezu commented Apr 7, 2020

@ChaiBapchya the hang occurs after ~10 minutes but the instance will remain available until timeout happens (3 hours). So @ptrendx's suggestion should also work.
I do remember that someone attempted this for the same or a similar issue before? I think there are logs in the internal issue tracker?

@ptrendx
Copy link
Member Author

ptrendx commented Apr 9, 2020

@mxnet-bot run ci [centos-gpu, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu, centos-gpu]

@ptrendx
Copy link
Member Author

ptrendx commented Apr 10, 2020

@mxnet-bot run ci [centos-gpu, unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu, centos-gpu]

@ptrendx ptrendx added the v1.x Targeting v1.x branch label Apr 10, 2020
@ptrendx
Copy link
Member Author

ptrendx commented Apr 10, 2020

Added the v1.x label even though this PR goes to master in order to track that we want this included in 1.7 @ciyongch FYI.

@ciyongch
Copy link
Contributor

@ptrendx thanks for point it out, I'm adding this to 1.7.0 roadmap #16864.
BTW, are you going to backport this PR to v1.x after it's merged into master?

@ptrendx
Copy link
Member Author

ptrendx commented Apr 11, 2020

Yes, I'm going to backport my prs.

@leezu leezu merged commit 37c9dd6 into apache:master Apr 11, 2020
ptrendx added a commit to ptrendx/mxnet that referenced this pull request Apr 13, 2020
* Fix for handling of negative axis, begin and end in fusion of slice ops

* Added test
ptrendx added a commit to ptrendx/mxnet that referenced this pull request Apr 14, 2020
* Fix for handling of negative axis, begin and end in fusion of slice ops

* Added test
ptrendx added a commit that referenced this pull request Apr 18, 2020
* Fix ElemwiseSum for more than 4 inputs (#17995)

* Fix ElemwiseSum for more than 4 inputs

* Added test

* Fix for handling negative indices in the fusion of slice (#17937)

* Fix for handling of negative axis, begin and end in fusion of slice ops

* Added test
@vafl
Copy link
Contributor

vafl commented Apr 20, 2020

Yes, I'm going to backport my prs.

Thanks for the fix. I'm wondering when are you planning to release a fix for 1.6?
This is a somewhat urgent issue for gluonts.

@ptrendx
Copy link
Member Author

ptrendx commented Apr 20, 2020

@vafl There is pretty much 0 chance that something like 1.6.1 would happen before 1.7.0 (at least that 's my understanding, @szha please correct me if I'm wrong), is there anything preventing GluonTS from jumping straight to 1.7?

@leezu
Copy link
Contributor

leezu commented Apr 20, 2020

@ptrendx not everyone is using the binary wheels from pypi. Having the fix in the 1.6 branch can still be useful despite not having a 1.6.1 release in the short term

@ptrendx
Copy link
Member Author

ptrendx commented Apr 20, 2020

I see. Ok, will push it there.

ptrendx added a commit to ptrendx/mxnet that referenced this pull request Apr 20, 2020
* Fix for handling of negative axis, begin and end in fusion of slice ops

* Added test
ptrendx added a commit that referenced this pull request Apr 21, 2020
…8118)

* Fix for handling of negative axis, begin and end in fusion of slice ops

* Added test
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
v1.x Targeting v1.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operator fusion breaks '+' operator on GPU with CachedOp
7 participants