Add failure tests to test_reference_counting #7400

edoakes · 2020-03-02T21:13:44Z

Why are these changes needed?

Need to test cases where workers fail. This is not comprehensive, but a good start.

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://ray.readthedocs.io/en/latest/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.

AmplabJenkins · 2020-03-02T21:19:29Z

Can one of the admins verify this patch?

stephanie-wang · 2020-03-02T21:45:02Z

python/ray/tests/test_reference_counting.py

- @ray.remote
+@pytest.mark.parametrize("failure", [False, True])
+def test_basic_serialized_reference(one_worker_100MiB, failure):
+ @ray.remote(max_retries=0)


Actually it would be great to also have some tests where max_retries > 0 so we can make sure that ref counting works when there are retries.

Oh yeah that's a really good point - I just did this to make it run more quickly. Do you think it's worth parametrizing it and testing both with and without retries?

Hmm it seems like testing with retries should cover all the cases without retries too, but testing both also seems fine.

Is there a way we can update the config to make the retries faster?

Let's just test the retry case with a short timeout. Will update.

AmplabJenkins · 2020-03-02T22:32:08Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22623/
Test PASSed.

AmplabJenkins · 2020-03-03T05:44:50Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22651/
Test PASSed.

edoakes · 2020-03-04T04:18:01Z

@stephanie-wang I left TODOs for the two bugs the tests uncovered this morning. I think we should merge this and address those separately.

edoakes · 2020-03-06T02:01:18Z

FYI - diff grew because I needed to split the tests to avoid timeouts in bazel

AmplabJenkins · 2020-03-06T03:12:49Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22793/
Test PASSed.

AmplabJenkins · 2020-03-06T19:33:41Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22818/
Test PASSed.

AmplabJenkins · 2020-03-11T01:26:15Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/22999/
Test FAILed.

AmplabJenkins · 2020-03-12T03:29:30Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/23056/
Test FAILed.

AmplabJenkins · 2020-03-16T21:47:17Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/23250/
Test FAILed.

* enable * Turn on eager eviction * Shorten tests and drain ReferenceCounter * Don't force kill actor handles that have gone out of scope, lint * Fix locks * Cleanup Plasma Async Callback (#7452) * [rllib][tune] fix some nans (#7611) * Change /tmp to platform-specific temporary directory (#7529) * [Serve] UI Improvements (#7569) * bugfix about test_dynres.py (#7615) Co-authored-by: senlin.zsl <[email protected]> * Java call Python actor method use actor.call (#7614) * bug fix about useage of absl::flat_hash_map::erase and absl::flat_hash_set::erase (#7633) Co-authored-by: senlin.zsl <[email protected]> * [Java] Make both `RayActor` and `RayPyActor` inheriting from `BaseActor` (#7462) * [Java] Fix the issue that the cached value in `RayObject` is serialized (#7613) * Add failure tests to test_reference_counting (#7400) * Fix typo in asyncio documentation (#7602) * Fix segfault * debug * Force kill actor * Fix test

Add failure tests to test_reference_counting

6c94c9c

edoakes requested a review from stephanie-wang March 2, 2020 21:13

stephanie-wang reviewed Mar 2, 2020

View reviewed changes

use retries

baaede3

stephanie-wang approved these changes Mar 5, 2020

View reviewed changes

edoakes added 2 commits March 5, 2020 17:28

Merge remote-tracking branch 'upstream/master' into failure-tests

8aefcf4

Split test to avoid timeouts

739cfdc

2 retries

38065a8

Merge remote-tracking branch 'upstream/master' into failure-tests

689cec2

edoakes added 3 commits March 11, 2020 19:09

fix test_global_gc

2c44a69

Merge remote-tracking branch 'upstream/master' into failure-tests

fe69c7c

fix ref count test

cd29e1a

Merge remote-tracking branch 'upstream/master' into failure-tests

ecb41a5

edoakes merged commit c1b0f9c into ray-project:master Mar 17, 2020

stephanie-wang pushed a commit to stephanie-wang/ray that referenced this pull request Mar 17, 2020

Add failure tests to test_reference_counting (ray-project#7400)

4c941db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add failure tests to test_reference_counting #7400

Add failure tests to test_reference_counting #7400

edoakes commented Mar 2, 2020

AmplabJenkins commented Mar 2, 2020

stephanie-wang Mar 2, 2020

edoakes Mar 2, 2020

stephanie-wang Mar 2, 2020

stephanie-wang Mar 2, 2020

edoakes Mar 3, 2020

AmplabJenkins commented Mar 2, 2020

AmplabJenkins commented Mar 3, 2020

edoakes commented Mar 4, 2020

edoakes commented Mar 6, 2020

AmplabJenkins commented Mar 6, 2020

AmplabJenkins commented Mar 6, 2020

AmplabJenkins commented Mar 11, 2020

AmplabJenkins commented Mar 12, 2020

AmplabJenkins commented Mar 16, 2020

Add failure tests to test_reference_counting #7400

Add failure tests to test_reference_counting #7400

Conversation

edoakes commented Mar 2, 2020

Why are these changes needed?

Checks

AmplabJenkins commented Mar 2, 2020

stephanie-wang Mar 2, 2020

Choose a reason for hiding this comment

edoakes Mar 2, 2020

Choose a reason for hiding this comment

stephanie-wang Mar 2, 2020

Choose a reason for hiding this comment

stephanie-wang Mar 2, 2020

Choose a reason for hiding this comment

edoakes Mar 3, 2020

Choose a reason for hiding this comment

AmplabJenkins commented Mar 2, 2020

AmplabJenkins commented Mar 3, 2020

edoakes commented Mar 4, 2020

edoakes commented Mar 6, 2020

AmplabJenkins commented Mar 6, 2020

AmplabJenkins commented Mar 6, 2020

AmplabJenkins commented Mar 11, 2020

AmplabJenkins commented Mar 12, 2020

AmplabJenkins commented Mar 16, 2020