Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add gluonCV to fix AMP Tutorial #15039

Merged
merged 1 commit into from
Jun 2, 2019
Merged

Conversation

Chancebair
Copy link
Contributor

Description

To fix the failure here: #15028

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http:https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Comments

@roywei
Copy link
Member

roywei commented May 22, 2019

Thanks for fixing this. could there be other errors from the tutorial if we fix the install?

@ptrendx
Copy link
Member

ptrendx commented May 22, 2019

Is there a way to trigger the test of the tutorial on the CI before merging and waiting for the nightly test?

@anirudh2290
Copy link
Member

Thanks for the fix! @ptrendx AFAIK the tutorials take a long time to run and since the customer impact from a broken tutorial is much lesser, they are not part of the CI.

@ptrendx
Copy link
Member

ptrendx commented May 22, 2019

@roywei I don't expect any other errors - I was running this tutorial before submitting it ;-). I did not know about the configuration of the tutorial CI, sorry for that!

@roywei
Copy link
Member

roywei commented May 23, 2019

@ptrendx It's ok, reproduce on CI could be troublesome. As long as notebooks are running fine locally and dependencies are installed in CI docker, it should be fine. It's documented here.
https://cwiki.apache.org/confluence/display/MXNET/Reproducing+test+results
Here are the steps, on AWS Deeplearning Base AMI Ubuntu:

  1. clone mxnet repo
  2. install requirements
pip3 install -r ci/requirements.txt --user
  1. run build command copied from jenkins log (click Build -> GPU: CUDA9.1+cuDNN7 -> 'Shell Scripts' -> 'show complete log'):
    http:https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/master/320/pipeline/38
ci/build.py --docker-registry mxnetci --platform ubuntu_build_cuda --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh build_ubuntu_gpu_cuda100_cudnn7
  1. Run the specifc job failed (Tutorial Python2 and 3), copy the command from log.
ci/build.py --docker-registry mxnetci --nvidiadocker --platform ubuntu_nightly_gpu --docker-build-retries 3 --shm-size 1500m /work/runtime_functions.sh nightly_tutorial_test_ubuntu_python2_gpu

@roywei
Copy link
Member

roywei commented May 23, 2019

@Chancebair @ptrendx I have tested it and unfortunately it failed again. It seems the tutorial test is treating warning as failures.
We can set tutorial to ignore warning similar in this PR #14532

ERROR:root:Warning:
      "/work/mxnet/python/mxnet/gluon/block.py:1146: UserWarning: Cannot decide type for the fo
llowing arguments. Consider providing them as input:\n",

Warning:
    "    /mxnet/code/python/mxnet/gluon/block.py:1138: UserWarning: Cannot decide type for the 
following arguments. Consider providing them as input:\n",

Warning:
      "/work/mxnet/python/mxnet/gluon/block.py:1146: UserWarning: Cannot decide type for the fo
llowing arguments. Consider providing them as input:\n",

Warning:
    "    /mxnet/code/python/mxnet/gluon/block.py:1138: UserWarning: Cannot decide type for the 
following arguments. Consider providing them as input:\n",
======================================================================
FAIL: test_tutorials.test_amp
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/work/mxnet/tests/tutorials/test_tutorials.py", line 203, in test_amp
    assert _test_tutorial_nb('amp/amp_tutorial')
AssertionError

Copy link
Member

@roywei roywei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to test this before merging, otherwise nightly will fail again

@Chancebair
Copy link
Contributor Author

We can either ignore the warnings or implement the suggestion of providing the type for args. @ptrendx would this be trivial enough?

@karan6181
Copy link
Contributor

@mxnet-label-bot add [Gluon, Installation, pr-awaiting-testing]

@marcoabreu marcoabreu added Gluon Installation pr-awaiting-testing PR is reviewed and waiting CI build and test labels May 23, 2019
@ptrendx
Copy link
Member

ptrendx commented May 24, 2019

@Chancebair I made a PR to your branch with a fix.

@Chancebair Chancebair requested a review from szha as a code owner May 27, 2019 08:22
@roywei
Copy link
Member

roywei commented May 30, 2019

@Chancebair could you rebase? CI should be passing now. Thanks

@szha szha merged commit 6118dcc into apache:master Jun 2, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Gluon Installation pr-awaiting-testing PR is reviewed and waiting CI build and test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants