Fix flakey test_ndarray.py:test_reduce #17312

DickJC123 · 2020-01-15T01:38:01Z

Description

This PR improves the reliability of the test_reduce unittest in test_ndarray.py. Two seen test failure outputs are shown below and can be repro'd with:

MXNET_TEST_SEED=359297469 nosetests --verbose -s --logging-level=DEBUG tests/python/unittest/test_ndarray.py:test_reduce
MXNET_TEST_SEED=626874295 nosetests --verbose -s tests/python/unittest/test_ndarray.py:test_reduce

In the first case, the tolerance for the float32 precision tests appears to be too tight, and has been relaxed. In the other case, a cast of the input data for the numpy 'golden copy' only was removed so that the data is identical for both models in the comparison, as is critical for getting argmax and argmin to compare identically. This only became a problem when float64 testing was recently introduced to this test.

With the fixes, 3000 trials of this test generated no errors.

The most recent PR to touch this test is #16234.

@ptrendx @wkcn

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
[ X] Changes are complete (i.e. I finished coding on this PR)
[X ] All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

MXNET_TEST_SEED=359297469 nosetests --verbose -s tests/python/unittest/test_ndarray.py:test_reduce

test_ndarray.test_reduce ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=359297469 to reproduce.
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: invalid value encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
FAIL

======================================================================
FAIL: test_ndarray.test_reduce
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/opt/mxnet/tests/python/unittest/common.py", line 177, in test_new
    orig_test(*args, **kwargs)
  File "/opt/mxnet/tests/python/unittest/test_ndarray.py", line 663, in test_reduce
    mx.nd.sum, True, allow_almost_equal=True)
  File "/opt/mxnet/tests/python/unittest/test_ndarray.py", line 659, in test_reduce_inner
    assert_array_almost_equal(ndarray_ret, numpy_ret, decimal=decimal)
  File "/usr/local/lib/python3.6/dist-packages/numpy/testing/_private/utils.py", line 1007, in assert_array_almost_equal
    precision=decimal)
  File "/usr/local/lib/python3.6/dist-packages/numpy/testing/_private/utils.py", line 819, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Arrays are not almost equal to 5 decimals

Mismatch: 100%
Max absolute difference: 1.5258789e-05
Max relative difference: 5.524665e-07
 x: array([-27.61941], dtype=float32)
 y: array(-27.61939, dtype=float32)
-------------------- >> begin captured logging << --------------------
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=359297469 to reproduce.
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 1 test in 0.101s

FAILED (failures=1)

MXNET_TEST_SEED=626874295 nosetests --verbose -s tests/python/unittest/test_ndarray.py:test_reduce
test_ndarray.test_reduce ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=626874295 to reproduce.
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: invalid value encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
FAIL

======================================================================
FAIL: test_ndarray.test_reduce
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/opt/mxnet/tests/python/unittest/common.py", line 177, in test_new
    orig_test(*args, **kwargs)
  File "/opt/mxnet/tests/python/unittest/test_ndarray.py", line 675, in test_reduce
    mx.nd.argmin, False, check_dtype=False)
  File "/opt/mxnet/tests/python/unittest/test_ndarray.py", line 661, in test_reduce_inner
    assert_array_equal(ndarray_ret, numpy_ret)
  File "/usr/local/lib/python3.6/dist-packages/numpy/testing/_private/utils.py", line 896, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "/usr/local/lib/python3.6/dist-packages/numpy/testing/_private/utils.py", line 819, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Arrays are not equal

Mismatch: 0.0882%
Max absolute difference: 4.
Max relative difference: nan
 x: array([[[[[3., 3.],
          [3., 2.],
          [2., 0.],...
 y: array([[[[[3, 3],
          [3, 2],
          [2, 0],...
-------------------- >> begin captured logging << --------------------
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=626874295 to reproduce.
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 1 test in 1.048s

FAILED (failures=1)

DickJC123 · 2020-01-15T02:38:53Z

I see now that the relaxing of precision for float32 was already applied by PR #16992. All that remains of this PR then is the cast-to-float32 removal to ensure that both models in the comparison are working on identical data.

wkcn

LGTM. Thank you : )

Fix flakey test_ndarray.py:test_reduce

5e1ed09

wkcn approved these changes Jan 15, 2020

View reviewed changes

ptrendx approved these changes Jan 15, 2020

View reviewed changes

DickJC123 merged commit a296dad into apache:master Jan 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flakey test_ndarray.py:test_reduce #17312

Fix flakey test_ndarray.py:test_reduce #17312

DickJC123 commented Jan 15, 2020

DickJC123 commented Jan 15, 2020

wkcn left a comment

Fix flakey test_ndarray.py:test_reduce #17312

Fix flakey test_ndarray.py:test_reduce #17312

Conversation

DickJC123 commented Jan 15, 2020

Description

Checklist

Essentials

Changes

Comments

DickJC123 commented Jan 15, 2020

wkcn left a comment

Choose a reason for hiding this comment