Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel RNG #94

Closed
ViralBShah opened this issue Jul 3, 2011 · 23 comments
Closed

parallel RNG #94

ViralBShah opened this issue Jul 3, 2011 · 23 comments
Labels
parallelism Parallel or distributed computation randomness Random number generation and the Random stdlib

Comments

@ViralBShah
Copy link
Member

How should parallel RNG be addressed? Codes that do parallel RNG:

This is an MPI code that does not seem to be maintained:
https://sprng.cs.fsu.edu/

A derivative of Mersenne Twister:
https://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html

At the very least,when randomize() is called, it should include the processor's number along with time to select a different seed on every processor.

@JeffBezanson
Copy link
Member

I don't fully understand what needs to be done here. By default, processors start with different seeds. To seed each process you can do @bcast srand(seeds[myid()]).

@StefanKarpinski
Copy link
Member

I think that will work for an initial cut. The real issue is making sure that entire distribute simulations/computations are deterministically reproducible from a single initial seed value. That means that the computation can't depend on race conditions regarding which processor gets to some point first, or even how many processors there are. For now we can definitely punt on this.

@JeffBezanson
Copy link
Member

I guess we need a hook for when processors are added so you can do custom initialization on the machine. And if we do checkpointing we need to save the random state.

@ViralBShah
Copy link
Member Author

Isn't it good enough that they read from /dev/urandom for now?

-viral

On Jul 13, 2011, at 11:37 AM, JeffBezanson wrote:

I don't fully understand what needs to be done here. By default, processors start with different seeds. To seed each process you can do @bcast srand(seeds[myid()]).

Reply to this email directly or view it on GitHub:
#94 (comment)

@StefanKarpinski
Copy link
Member

The hard part isn't getting distributed simulations to be random, it's getting them to be repeatable. Using /dev/urandom helps making things really random, but helps repeatability not at all. Let's worry about this when we have an actual use case. I have done simulations like this in the past, so the problem makes sense to me, but I don't necessarily know what a good solution is.

@ViralBShah
Copy link
Member Author

Well you can make them repeatable by just saving the seed. Doesn't seem to be difficult.

A related issue is that if you have a bunch of virtual machines being booted up for your computation, is it likely that their entropy pools are similar?

@ViralBShah
Copy link
Member Author

We already seed the RNG from /dev/urandom or system time. Hence, this is not really a major issue for now.

@StefanKarpinski
Copy link
Member

This is not at all a resolved issue. The problem with parallel RNG is not making all the machines behave differently, but to make pseudorandom behavior across many machines exactly repeatable. This won't get addressed until we have an actual example to see why it's a problem, but we shouldn't close it because it will be an issue at some point.

@ViralBShah
Copy link
Member Author

You can set a different seed on each machine for repeatable behavior, which is what you have to do in the 1p case as well. How is this specific to parallel rng?

-viral

On 09-Jan-2012, at 2:11 AM, Stefan [email protected] wrote:

This is not at all a resolved issue. The problem with parallel RNG is not making all the machines behave differently, but to make pseudorandom behavior across many machines exactly repeatable. This won't get addressed until we have an actual example to see why it's a problem, but we shouldn't close it because it will be an issue at some point.


Reply to this email directly or view it on GitHub:
#94 (comment)

@StefanKarpinski
Copy link
Member

Because in a complex distributed computation the same exact work won't always happen on the same machines. Trust me, this is a more complicated issue. If you must close it, fine, but we will encounter this problem at some point. I'll pose a simple example: let's say we want to do a distributed parallel Buffon's needle that can be repeatably run on any number of machines. How do you do that?

@ViralBShah
Copy link
Member Author

I agree, that when you change the number of nodes, it is a problem. But, for the same number of nodes, there is no issue. My only point was that for the immediate purposes of this issue, it was resolved. I am quite sure that even SPRNG and others do not offer this feature.

I suggest that we create a new issue when we run into this problem, or reopen this one then.

-viral

On Jan 9, 2012, at 2:52 AM, Stefan Karpinski wrote:

Because in a complex distributed computation the same exact work won't always happen on the same machines. Trust me, this is a more complicated issue. If you must close it, fine, but we will encounter this problem at some point. I'll pose a simple example: let's say we want to do a distributed parallel Buffon's needle that can be repeatably run on any number of machines. How do you do that?


Reply to this email directly or view it on GitHub:
#94 (comment)

@ViralBShah
Copy link
Member Author

Ok, here's a proposition. The user calls a routine called sdrand (for set distributed random seeds). The argument to it is the maximum number of processors that the user ever expects to use - say 100 or 1000 or whatever. The routine then seeds the RNG uniformly within the RNG's interval for that many processors. Now, if the user uses up to the specified number of processors, reproducible results are possible.

-viral

On Jan 9, 2012, at 2:52 AM, Stefan Karpinski wrote:

Because in a complex distributed computation the same exact work won't always happen on the same machines. Trust me, this is a more complicated issue. If you must close it, fine, but we will encounter this problem at some point. I'll pose a simple example: let's say we want to do a distributed parallel Buffon's needle that can be repeatably run on any number of machines. How do you do that?


Reply to this email directly or view it on GitHub:
#94 (comment)

@StefanKarpinski
Copy link
Member

This doesn't work because if the data distribution is different, the same
calls to the rng will be on different nodes and therefore change the
results. Why can't we just leave this issue open until we find ourselves in
a position to tackle the issue — hopefully with a real-world use-case to
drive it.

On Sun, Jan 8, 2012 at 10:07 PM, Viral B. Shah <
[email protected]

wrote:

Ok, here's a proposition. The user calls a routine called sdrand (for set
distributed random seeds). The argument to it is the maximum number of
processors that the user ever expects to use - say 100 or 1000 or whatever.
The routine then seeds the RNG uniformly within the RNG's interval for that
many processors. Now, if the user uses up to the specified number of
processors, reproducible results are possible.

-viral

On Jan 9, 2012, at 2:52 AM, Stefan Karpinski wrote:

Because in a complex distributed computation the same exact work won't
always happen on the same machines. Trust me, this is a more complicated
issue. If you must close it, fine, but we will encounter this problem at
some point. I'll pose a simple example: let's say we want to do a
distributed parallel Buffon's needle that can be repeatably run on any
number of machines. How do you do that?


Reply to this email directly or view it on GitHub:
#94 (comment)


Reply to this email directly or view it on GitHub:
#94 (comment)

@rdiaz02
Copy link

rdiaz02 commented Apr 28, 2012

I think that just using different seeds for each node (even if reproducible) might not be enough, as the streams in different workers might get into step. Parallel RNG has been addressed in the "parallel" package in R. This is a link to a pdf (see p.5)

https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf

@ViralBShah
Copy link
Member Author

@StefanKarpinski
Copy link
Member

Yeah, we should definitely include that. Very cool.

@wildart
Copy link
Member

wildart commented Jul 27, 2015

I wrote dSFMTjump package which works with MersenneTwister type.

@ViralBShah
Copy link
Member Author

I have been wanting to write that translation for a while. I would love to have this in Base, since it is a tiny amount of code and greatly useful in a number of situations. Could you submit a PR?

@wildart
Copy link
Member

wildart commented Jul 28, 2015

Sure, I'd like to see this in Base as well.

@jakebolewski
Copy link
Member

Seems to be addressed by #12498.

@andreasnoack
Copy link
Member

I think it is too early to close this one. The jump function is one step, but we should have a complete setup for parallel rng in place before we close here.

@andreasnoack andreasnoack reopened this Aug 11, 2015
@simonbyrne simonbyrne added the randomness Random number generation and the Random stdlib label Aug 13, 2015
StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018
Add backtracking to max-sum, improve greedy solver; the solver is now exact
@tkf
Copy link
Member

tkf commented Jul 29, 2019

How current threading and RNG work may not be ideal in terms of reproducibility because Random.seed!(seed) is not enough to control all thread-local RNG states:

using Random

Random.seed!(1)

Threads.@threads for _ in 1:Threads.nthreads()
    @show Threads.threadid(), rand()
end

Users need to do something like the following to initialize all RNGs.

using Future

Random.seed!(1)
for i in 2:Threads.nthreads()
    Random.THREAD_RNGs[i] = Future.randjump(Random.THREAD_RNGs[i - 1], big(10)^20)
end

I wonder if it makes sense to invoke above code with seed! on GLOBAL_RNG by default. It should make code using static scheduling reproducible. On the other hand, maybe it does not make sense as Julia uses dynamic scheduling, so that most of the code would loose reproducibility when relying on thread-local state anyway?

@vtjnash
Copy link
Member

vtjnash commented Jan 15, 2020

We explicitly stated that we did not want seed! to do that by default as it conflicts with numerous other design goals. We'd perhaps consider accepting PRs that added it as a helper function though, but no promises. Note that the original issue was about Distributed (e.g. processes not threads), though any helper function now should probably take both into account.

Closing as "won't fix," though, as there is, at least, no changes planned to the default behavior.

@vtjnash vtjnash closed this as completed Jan 15, 2020
cmcaine pushed a commit to cmcaine/julia that referenced this issue Sep 24, 2020
LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation randomness Random number generation and the Random stdlib
Projects
None yet
Development

No branches or pull requests

10 participants