Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed test suite: if Threads.nthreads() > 1, skip certain tests #42764

Merged
merged 1 commit into from
Oct 25, 2021

Conversation

DilumAluthge
Copy link
Member

@DilumAluthge DilumAluthge commented Oct 22, 2021

  1. If Threads.nthreads() == 1, then we will run all tests.
  2. If Threads.nthreads() > 1:
    • If ENV["JULIA_TEST_INCLUDE_THREAD_UNSAFE"] == "true", then we will run all tests.
    • Else, we will skip a few tests that are known to non-deterministically fail when there are multiple threads.

@DilumAluthge DilumAluthge added domain:parallelism Parallel or distributed computation test This change adds or pertains to unit tests backport 1.6 Change should be backported to release-1.6 backport 1.7 labels Oct 22, 2021
Copy link
Member

@tkf tkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it's nice to mark thread-incompatible tests like this.

stdlib/Distributed/test/distributed_exec.jl Outdated Show resolved Hide resolved
@DilumAluthge
Copy link
Member Author

@tkf Should we rename some of the other functions to reflect the new name of the environment variable?

@tkf
Copy link
Member

tkf commented Oct 22, 2021

Yeah, maybe that's a good idea since it'd make it easy to move this to testhelper later if we want to use it elsewhere.

@DilumAluthge
Copy link
Member Author

Any suggestions on a good naming scheme?

@tkf
Copy link
Member

tkf commented Oct 22, 2021

I think reflecting the environment variable, like you initially did, is a good approach. So run_distributed_multithreaded -> skip_if_multithreaded.

@DilumAluthge
Copy link
Member Author

What about the naming for skipping tests?

Right now we have:

if run_distributed_multithreaded()
    @test f()
end

Maybe something like:

if !skip_this_test_if_multithreaded()
    @test f()
end

I don't love the double negative though.....

@tkf
Copy link
Member

tkf commented Oct 22, 2021

I like "skip" here since it is similar to @test_skip and so I think it outweighs the downside of double-negative. That said, I think using something like include_thread_unsafe is good too.

@DilumAluthge DilumAluthge marked this pull request as draft October 25, 2021 00:48
@DilumAluthge
Copy link
Member Author

Marking this as a draft until I finish the renaming.

@DilumAluthge DilumAluthge force-pushed the dpa/distributed-skip-when-multithreaded branch from 889f823 to e93654d Compare October 25, 2021 03:49
@DilumAluthge DilumAluthge marked this pull request as ready for review October 25, 2021 03:49
@DilumAluthge DilumAluthge requested a review from tkf October 25, 2021 03:49
@DilumAluthge
Copy link
Member Author

DilumAluthge commented Oct 25, 2021

I'd like to avoid using double negatives, so I chose to go with include_thread_unsafe().

The environment variable is now named JULIA_TEST_INCLUDE_THREAD_UNSAFE.

And the function is now named include_thread_unsafe().

Copy link
Member

@tkf tkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@DilumAluthge DilumAluthge force-pushed the dpa/distributed-skip-when-multithreaded branch from e93654d to f5a4da3 Compare October 25, 2021 04:06
@DilumAluthge
Copy link
Member Author

I'm going to wait for CI to finish, and then I'll check the CI logs to make sure the warning correctly shows up in the logs of the multi-threaded CI jobs.

@DilumAluthge DilumAluthge force-pushed the dpa/distributed-skip-when-multithreaded branch from f5a4da3 to 2cb21db Compare October 25, 2021 05:24
@DilumAluthge DilumAluthge added the status:merge me PR is reviewed. Merge when all tests are passing label Oct 25, 2021
@DilumAluthge
Copy link
Member Author

The Buildkite logs look correct.

This PR is good to merge once CI is green.

@DilumAluthge DilumAluthge merged commit 0682132 into master Oct 25, 2021
@DilumAluthge DilumAluthge deleted the dpa/distributed-skip-when-multithreaded branch October 25, 2021 08:49
@DilumAluthge DilumAluthge removed the status:merge me PR is reviewed. Merge when all tests are passing label Oct 25, 2021
KristofferC pushed a commit that referenced this pull request Oct 28, 2021
KristofferC pushed a commit that referenced this pull request Oct 29, 2021
@KristofferC
Copy link
Sponsor Member

KristofferC commented Nov 8, 2021

If Distributed needs to be tested both threaded and unthreaded it should just do so, like:

julia/test/threads.jl

Lines 5 to 11 in d39b2c0

let cmd = `$(Base.julia_cmd()) --depwarn=error --rr-detach --startup-file=no threads_exec.jl`
for test_nthreads in (1, 2, 4, 4) # run once to try single-threaded mode, then try a couple times to trigger bad races
new_env = copy(ENV)
new_env["JULIA_NUM_THREADS"] = string(test_nthreads)
run(pipeline(setenv(cmd, new_env), stdout = stdout, stderr = stderr))
end
end

It should not be an external "setting" so that to fully test Julia, one needs to run the full test suite multiple times with the different settings when 99%+ of the tests execute exactly the same no matter that setting.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Nov 8, 2021

This is incorrect. The entire Julia test suite needs to be run both singlethreaded and multithreaded, at least on a single platform. There are bugs that only show up in one case versus the other.

Just off the top of my head, there was a test for one of the invalid sysimage code paths that passed when singlethreaded but failed when multithreaded. More specifically, we were testing that there would not be a segfault when exercising a certain code path. However, when we ran the tests with multiple threads enabled, the test segfaulted. This indicated that there was a bug in this particular code path.

That test was not part of the Distributed test suite.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Nov 8, 2021

when 99%+ of the tests execute exactly the same no matter that setting.

This is just not true. There are all kinds of subtle bugs in Julia that are absent when threading is disabled and are present when threading is enabled. The only way to test for these bugs is to run the test suite both with and without threading.

As I said on Slack, in order to conserve CI resources, we only do this on linux64. On linux64, we run the test suite both with and without threading. On all other architectures and operating systems, we only run the test suite once.

KristofferC pushed a commit that referenced this pull request Nov 11, 2021
@KristofferC KristofferC removed backport 1.6 Change should be backported to release-1.6 backport 1.7 labels Nov 13, 2021
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
staticfloat pushed a commit that referenced this pull request Dec 23, 2022
vchuravy pushed a commit to JuliaLang/Distributed.jl that referenced this pull request Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:parallelism Parallel or distributed computation test This change adds or pertains to unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants