-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Mersenne Twister to generate random numbers. #329
Conversation
Crystal used LCG (in libc) as its random number generator. However, you know, LCG has a defect in terms of the period. This commit replaces the LCG random number generator with Mersennne Twister(mt19937) random number generator. Mt19937 is employed in Ruby standard library because of the quality of random numbers which it generates. I added `Random` class as Ruby's standard library and added `Random::MT19937` class to implement mt19937. This is because I wanted to isolate the interfaces of Crystal's random number API and its implementation. In the future, other random number generators (e.g. xor128) may be implemented in addition to mt19937. If you know the detail of mt19937, see https://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.
A test for mt19937 random number generator is added. It compares random numbers generated by `Random::MT19937` with random numbers generated by an official C implementation. The source code of C implementation is available in below. https://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c
すごい! This is indeed very needed in the standard library. I really like it that you included that big spec verifying the generated numbers. I have a few comments/questions, but I'll accept the pull request and later we can change those things. |
Use Mersenne Twister to generate random numbers.
|
||
C.srand(Intrinsics.read_cycle_counter.to_u32) | ||
def initialize(seeds = Array(UInt32).new(4){ Intrinsics.read_cycle_counter.to_u32 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the default seeds
we could use:
StaticArray(UInt32, 4).new { Intrinsics.read_cycle_counter.to_u32 }
In that way no extra memory is allocated. What do you think?
It won't compile because later the method size
is not found for StaticArray, but we can add it (or change the code to say length
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add size
:P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, StaticArray
can be used here if the size
problem is resolved.
More comments, questions:
THREADS = 20
TIMES = 1_000
# Generate 20_000 random numbers in v2 in parallel
m = Random::MT19937.new [0x123, 0x234, 0x345, 0x456]
values = (1..THREADS).map do |i|
Thread.new do
(1..TIMES).map { m.next_number }
end
end.map &.join
v2 = [] of typeof(m.next_number)
values.each do |vs|
v2.concat vs
end
v2.sort!
# Geenrate 20_000 random numbers in sequence
m = Random::MT19937.new [0x123, 0x234, 0x345, 0x456]
v3 = (1..THREADS * TIMES).map { m.next_number }
v3.sort!
puts v2 == v3 #=> false I'm not sure, but I think the last result should be true. I don't know how to make it thread-safe, but since we don't have much support for concurrency right now we can leave this problem for later. In Rust they have a random number generator per thread (thread local) and |
k -= 1 | ||
end | ||
|
||
# Use to_i because substituting 0x80000000 causes SEGV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this comment for? Could it be related to the SEGV issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I forgot to remove out-dated comment...
Yes it was a comment for the SEGV issue but it is no longer needed.
Another question: should we include the copyright notice of the original source code? I saw it in Ruby and D's source code. |
Actually Ruby sort of has two Random classes, Random, which is a PNG, and SecureRandom, which tries to make OS level random data generation methods available. I think merging those into one interface and being able to choose the source upon initialization is a good choice. Kernel#rand should just delegate to an instance using the default engine, if you want something specific you do stuff like |
Oh, and other stdlib methods should utilize that interface: some_array.sample # delegate to the same generator Kernel#rand uses
some_array.sample(Random.new(:system)) # use passed generator Same for |
I have done quite a lot of work coding the most up-to-date and faster version of the Mersenne twister under discussion here :) It was suggested that it shouldn't go into base but rather be maintained as an external package. Not sure why the older version of the Mersenne twister is regarded as more useful and seems to be preferred as more acceptable :) |
@Scidom Yes, and it's very appreciated. But the downside of that version is that we would need to include the C source code of that generator in the compiler, or change the Makefile to download that and then link it. The current generator (the one of this pull request) is written in Crystal so it has no dependencies. We could write dsfmt in Crystal, but I'm not sure we can just now because it might use SIMD instructions. Also, this generator generates 10_000_000 numbers in 0.037 seconds, which I don't think it's too bad (In Ruby it takes 0.85 seconds). So having a simple, efficient enough, free of dependencies generator as the default one is good, I think. But later if you/us implement dsfmt we can replace the default one if it is much better :-) |
I have pretty much finished the implementation under scidom/random.cr; what is primarily missing at the moment is just to provide automatic seed creation (have been way too busy with some software I code for my department and that's why I haven't found the chance to do this yet). I agree with you, in general it is better to avoid inducing external dependencies; is the current PR based on an external C lib or is it enirely native? If the latter is true, then I agree it should be preferable. I will try to complete random.cr and compare the speed of the two approaches in the near future. Happy new year :) |
Thank you for the merge!
As advance knowledge, random number generators have some trade-offs. For example, MT19937 generates very high-quality random numbers but its internal state is big and generating a number is a bit heavy. Xor128 generates good-quality random numbers and the generation is fast but it is difficult to choose initial seed. So it is good idea that users can select a random number generator. However, as you said, it should be done by language or its library and it depends on a judgement of implementer of the language. So I think it is good to follow your idea 'I don't have to stop and think which generator I need.'. For example, Ruby has the same idea and Ruby's random module has only mt19937. Matz thinks that other RNG should be implemented as a library. And D implements many RNGs because it intends to be used for things where choosing RNG is a severe problem (e.g. games).
Random number generator has its internal state and updates it when generating a number. So, using the same generator by multiple threads causes a data race. In order to make all number generations work well in multi-threads, generating number must be guarded by mutex. I think, singleton method
I referred this C implementation heavily and its license says:
and the condition says:
It means that below copyright notice is needed, I think.
Your idea 'merging those into one interface' looks good for me. I don't know why Ruby's module is isolated like Oh, sorry that I don't know your work. I'm interested in your implementation of mt19937. I think your good implementation should be merged into this PR's implementation. Generally, I think a specific language's library should be written by its language because improvements of the language reflects its library directly. |
@rhysd, I agree with your opinion; it is better for many reasons to implement std lib functionality natively (in Crystal in our case). At the moment, the dSFMT implementation is a wrapper around the corresponding C library. Nothing stops us from actually implementing dSFMT natively; it would only take some more studying of the original paper, but this would be good anyhow because it would result in a better understanding of the underlying algorithm. |
Just learned about Mersenne Twister today. Thanks a lot @rhysd 🙏 We really miss you 😢 |
Crystal now uses LCG (in libc) as its random number generator. However, you know, LCG has a defect in terms of the period.
This PR replaces the LCG random number generator with Mersennne Twister(mt19937) random number generator. Mt19937 is employed in Ruby standard library because of the high-quality of random numbers which it generates.
I added
Random
class as Ruby's standard library and addedRandom::MT19937
class to implement mt19937. This is because I wanted to isolate the interfaces of Crystal's random number API and its implementation. In the future, other random number generators (e.g. xor128) may be implemented in addition to mt19937.If you know the detail of mt19937, see:
https://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.
I also added a test for mt19937 random number generator. It compares random numbers generated by
Random::MT19937
with random numbers generated by an official C implementation.The source code of C implementation is available in below.
https://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c