Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unify shuffle and randperm #50318

Merged
merged 3 commits into from
Jul 11, 2023
Merged

Conversation

ctarn
Copy link
Contributor

@ctarn ctarn commented Jun 28, 2023

unify the algorithms of shuffle and randperm without performance loss, so that:

  • shuffle(rng, 1:n) == randperm(rng, n)
  • shuffle(rng, eachindex(1:n)) == shuffle(rng, 1:n)

see also:

correctness:

ns = [1, 10, 100, 1000, 10000]

for n in ns
    @show shuffle!(MersenneTwister(1), Vector(1:n) .* 7) ./ 7 == randperm(MersenneTwister(1), n)
end

"""
output:
shuffle!(MersenneTwister(1), Vector(1:n) .* 7) ./ 7 == randperm(MersenneTwister(1), n) = true
shuffle!(MersenneTwister(1), Vector(1:n) .* 7) ./ 7 == randperm(MersenneTwister(1), n) = true
shuffle!(MersenneTwister(1), Vector(1:n) .* 7) ./ 7 == randperm(MersenneTwister(1), n) = true
shuffle!(MersenneTwister(1), Vector(1:n) .* 7) ./ 7 == randperm(MersenneTwister(1), n) = true
shuffle!(MersenneTwister(1), Vector(1:n) .* 7) ./ 7 == randperm(MersenneTwister(1), n) = true
"""

performance:

ns = [1, 10, 100, 1000, 10000]

t_new = [@belapsed(shuffle!(MersenneTwister(1), Vector(1:$n))) for n in ns]
t_old = [@belapsed(Random.shuffle!(MersenneTwister(1), Vector(1:$n))) for n in ns]
println("small item:\n#item\tnew\told")
display(hcat(ns, t_new, t_old))

"""
output:
small item:
#item   new     old
5×3 Matrix{Float64}:
     1.0  9.75e-6    9.748e-6
    10.0  9.938e-6   1.02e-5
   100.0  1.0523e-5  1.0548e-5
  1000.0  1.3765e-5  1.3549e-5
 10000.0  7.2876e-5  7.1795e-5
"""

vs = [[NTuple{1000, Float64}(rand(1000)) for _ in 1:n] for n in ns]
t_new = [@belapsed(shuffle!(MersenneTwister(1), $v)) for v in copy.(vs)]
t_old = [@belapsed(Random.shuffle!(MersenneTwister(1), $v)) for v in copy.(vs)]
println("large item:\n#item\tnew\told")
display(hcat(ns, t_new, t_old))

"""
output:
large item:
#item   new     old
5×3 Matrix{Float64}:
     1.0  9.811e-6     9.733e-6
    10.0  1.2482e-5    1.2365e-5
   100.0  4.4791e-5    4.3842e-5
  1000.0  0.000401874  0.000391641
 10000.0  0.00989083   0.0101406
"""

@oscardssmith oscardssmith added the domain:randomness Random number generation and the Random stdlib label Jun 28, 2023
@ctarn ctarn closed this Jun 29, 2023
@ctarn ctarn deleted the ctarn/shuffle branch June 29, 2023 11:42
@ctarn ctarn restored the ctarn/shuffle branch June 29, 2023 11:43
@ctarn ctarn reopened this Jun 29, 2023
@stevengj
Copy link
Member

stevengj commented Jun 29, 2023

It makes sense not to me not to have shuffle and randperm use gratuitously different algorithms. Would be good for someone who implemented these (e.g. @rfourquet way going way back to #14772) to take a look, however.

@stevengj stevengj requested a review from rfourquet June 29, 2023 13:20
@ctarn
Copy link
Contributor Author

ctarn commented Jun 29, 2023

The difference can be traced back to the initial implementation of 13 years ago by @ViralBShah:

julia/combinatorics.j

Lines 151 to 169 in 18b72d1

# Knuth shuffle
function shuffle(a::Vector)
for i = length(a):-1:2
j = randint(i)
a[i], a[j] = a[j], a[i]
end
return a
end
function randperm(n::Int)
a = Array(typeof(n), n)
a[1] = 1
for i = 2:n
j = randint(i)
a[i] = a[j]
a[j] = i
end
return a
end

@rfourquet made an improvement in #14772 while preserving the original style.

I don't see any reason why they are different, and since shuffle on Base.OneTo(n) is currently specialized as randperm(n), it is confusing that shuffle(collect(Base.OneTo(n))) != shuffle(Base.OneTo(n)) while collect(Base.OneTo(n)) == Base.OneTo(n).

shuffle(r::AbstractRNG, a::Base.OneTo) = randperm(r, last(a))

julia> n = 4

julia> shuffle(MersenneTwister(1), collect(Base.OneTo(n)))
4-element Vector{Int64}:
 1
 2
 4
 3

julia> shuffle(MersenneTwister(1), Base.OneTo(n))
4-element Vector{Int64}:
 1
 3
 4
 2

julia> collect(Base.OneTo(n)) == Base.OneTo(n)
true

julia> shuffle(MersenneTwister(1), collect(Base.OneTo(n))) == shuffle(MersenneTwister(1), Base.OneTo(n))
false

@Seelengrab
Copy link
Contributor

Seelengrab commented Jul 4, 2023

  • shuffle(rng, 1:n) == randperm(rng, n)
  • shuffle(rng, eachindex(1:n)) == shuffle(rng, 1:n)

Why is it advantageous to guarantee that these properties hold? It's all randomness either way - I don't think guaranteeing a particular result is desirable, as that will likely constrain any performance or other improvements to these algorithms in the future.

Not to mention that MersenneTwister hasn't been the default RNG for some time now, and showing that this property holds for MersenneTwister(1) and n == 4 says nothing about whether this holds in general.

@oscardssmith
Copy link
Member

It's not a major advantage, but if it's easy to do it seems worthwhile

@stevengj
Copy link
Member

stevengj commented Jul 4, 2023

Also, it makes the source code less readable/maintainable if shuffle and randperm use gratuitously different algorithms for essentially the same task.

Copy link
Member

@rfourquet rfourquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rfourquet rfourquet added the status:forget me not PRs that one wants to make sure aren't forgotten label Jul 9, 2023
@stevengj stevengj added status:merge me PR is reviewed. Merge when all tests are passing and removed status:forget me not PRs that one wants to make sure aren't forgotten status:merge me PR is reviewed. Merge when all tests are passing labels Jul 9, 2023
@Seelengrab
Copy link
Contributor

If we want to guarantee these properties, this PR also should check that that they hold in a test.

@ctarn
Copy link
Contributor Author

ctarn commented Jul 9, 2023

If we want to guarantee these properties, this PR also should check that that they hold in a test.

It can prevent some further improvements, although this is unlikely to happen. I am happy to add some tests, but don't insist on it.

These properties can also be documented, but don't have to be, I think.

@rfourquet rfourquet merged commit 2d4d096 into JuliaLang:master Jul 11, 2023
6 checks passed
@rfourquet
Copy link
Member

Thanks @ctarn !

@ctarn ctarn deleted the ctarn/shuffle branch June 20, 2024 12:40
@ctarn ctarn restored the ctarn/shuffle branch June 20, 2024 12:47
@ctarn ctarn deleted the ctarn/shuffle branch June 20, 2024 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants