Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

test_mkldnn.test_Deconvolution failed in CI #12579

Closed
aaronmarkham opened this issue Sep 17, 2018 · 5 comments · Fixed by #20292
Closed

test_mkldnn.test_Deconvolution failed in CI #12579

aaronmarkham opened this issue Sep 17, 2018 · 5 comments · Fixed by #20292

Comments

@aaronmarkham
Copy link
Contributor

Description

Flakey.
https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12504/7/pipeline

======================================================================

FAIL: test_mkldnn.test_Deconvolution

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in runTest

    self.test(*self.arg)

  File "/work/mxnet/tests/python/mkl/test_mkldnn.py", line 347, in test_Deconvolution

    check_Deconvolution_training(stype)

  File "/work/mxnet/tests/python/mkl/test_mkldnn.py", line 343, in check_Deconvolution_training

    check_numeric_gradient(test, in_location, numeric_eps=1e-2, rtol=0.16, atol=1e-4)

  File "/work/mxnet/python/mxnet/test_utils.py", line 912, in check_numeric_gradient

    ("NUMERICAL_%s"%name, "BACKWARD_%s"%name))

  File "/work/mxnet/python/mxnet/test_utils.py", line 491, in assert_almost_equal

    raise AssertionError(msg)

AssertionError: 

Items are not equal:

Error 1.512910 exceeds tolerance rtol=0.160000, atol=0.000100.  Location of maximum error:(0, 2, 9), a=0.001264, b=0.001867

 NUMERICAL_data: array([[[-0.95005035, -0.5466819 , -0.8472204 , ..., -0.81636906,

         -0.53954124, -0.557071  ],

        [-0.76055527, -0.4667163 , -1.1051655 , ..., -0.66776276,...

 BACKWARD_data: array([[[-0.94975847, -0.5468061 , -0.8476609 , ..., -0.81620246,

         -0.53933674, -0.55766964],

        [-0.75996923, -0.46629372, -1.1041552 , ..., -0.6681724 ,...



@stu1130
Copy link
Contributor

stu1130 commented Sep 17, 2018

Thanks for reporting the issue @aaronmarkham
@mxnet-label-bot[Flaky, Test]

@pengzhao-intel
Copy link
Contributor

@luobao-intel please take a look for this case.

@luobao-intel
Copy link
Contributor

This pipeline didn't provide the np/mx/python random seed. How could this failure be reproduced?

@lebeg
Copy link
Contributor

lebeg commented Oct 9, 2018

Another run failed on master CI:
https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1728/pipeline

======================================================================
FAIL: test_mkldnn.test_Deconvolution
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/work/mxnet/tests/python/mkl/test_mkldnn.py", line 346, in test_Deconvolution
    check_Deconvolution_training(stype)
  File "/work/mxnet/tests/python/mkl/test_mkldnn.py", line 342, in check_Deconvolution_training
    check_numeric_gradient(test, in_location, numeric_eps=1e-2, rtol=0.16, atol=1e-4)
  File "/work/mxnet/python/mxnet/test_utils.py", line 915, in check_numeric_gradient
    ("NUMERICAL_%s"%name, "BACKWARD_%s"%name))
  File "/work/mxnet/python/mxnet/test_utils.py", line 491, in assert_almost_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
Error 3.121914 exceeds tolerance rtol=0.160000, atol=0.000100.  Location of maximum error:(2, 1, 5), a=-0.000381, b=-0.001386
 NUMERICAL_data: array([[[-0.6184697 , -0.50860643, -0.6415248 , ..., -0.7978529 ,
         -0.8801222 , -0.7802248 ],
        [-0.26806593, -0.1953423 , -0.14332533, ..., -0.17287433,...
 BACKWARD_data: array([[[-0.6174789 , -0.5086705 , -0.6417394 , ..., -0.79945517,
         -0.88075024, -0.77997565],
        [-0.26776323, -0.19459067, -0.14422962, ..., -0.1742437 ,...

@luobao-intel it's due to absent @with_seed() annotation on the method.

@lebeg
Copy link
Contributor

lebeg commented Oct 9, 2018

I've added the seed annotations to the tests (and disabled them for now) in #12770

@luobao-intel I'm afraid you would need to try to reproduce the tests in a loop. You could use our dockerized environment and modify the testing function to your needs (run only the affected test, run in a loop) in runtime_functions.sh.

To build:

ci/build.py -p ubuntu_cpu

And to run:

ci/build.py -p ubuntu_cpu /work/runtime_functions.sh unittest_ubuntu_python3_cpu_mkldnn

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
6 participants