Skip to content

Commit

Permalink
Expand documentation of custom random samplers. (#31787)
Browse files Browse the repository at this point in the history
  • Loading branch information
tpapp authored and fredrikekre committed May 9, 2019
1 parent 48634f9 commit e813f0d
Show file tree
Hide file tree
Showing 2 changed files with 110 additions and 58 deletions.
114 changes: 74 additions & 40 deletions stdlib/Random/docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,18 +80,45 @@ in order to support usual types of generated values.

### Generating random values of custom types

There are two categories: generating values from a type (e.g. `rand(Int)`), or from a collection (e.g. `rand(1:3)`).
The simple cases are explained first, and more advanced usage is presented later.
We assume here that the choice of algorithm is independent of the RNG, so we use `AbstractRNG` in our signatures.
Generating random values for some distributions may involve various trade-offs. *Pre-computed* values, such as an [alias table](https://en.wikipedia.org/wiki/Alias_method) for discrete distributions, or [“squeezing” functions](https://en.wikipedia.org/wiki/Rejection_sampling) for univariate distributions, can speed up sampling considerably. How much information should be pre-computed can depend on the number of values we plan to draw from a distribution. Also, some random number generators can have certain properties that various algorithms may want to exploit.

The `Random` module defines a customizable framework for obtaining random values that can address these issues. Each invocation of `rand` generates a *sampler* which can be customized with the above trade-offs in mind, by adding methods to `Sampler`, which in turn can dispatch on the random number generator, the object that characterizes the distribution, and a suggestion for the number of repetitions. Currently, for the latter, `Val{1}` (for a single sample) and `Val{Inf}` (for an arbitrary number) are used, with `Random.Repetition` an alias for both.

The object returned by `Sampler` is then used to generate the random values, by a method of `rand` defined for this purpose. Samplers can be arbitrary values, but for most applications the following predefined samplers may be sufficient:

1. `SamplerType{T}()` can be used for implementing samplers that draw from type `T` (e.g. `rand(Int)`).

2. `SamplerTrivial(self)` is a simple wrapper for `self`, which can be accessed with `[]`. This is the recommended sampler when no pre-computed information is needed (e.g. `rand(1:3)`).

3. `SamplerSimple(self, data)` also contains the additional `data` field, which can be used to store arbitrary pre-computed values.

We provide examples for each of these. We assume here that the choice of algorithm is independent of the RNG, so we use `AbstractRNG` in our signatures.

```@docs
Random.Sampler
Random.SamplerType
Random.SamplerTrivial
Random.SamplerSimple
```

Decoupling pre-computation from actually generating the values is part of the API, and is also available to the user. As an example, assume that `rand(rng, 1:20)` has to be called repeatedly in a loop: the way to take advantage of this decoupling is as follows:

```julia
rng = MersenneTwister()
sp = Random.Sampler(rng, 1:20) # or Random.Sampler(MersenneTwister, 1:20)
for x in X
n = rand(rng, sp) # similar to n = rand(rng, 1:20)
# use n
end
```

This is the mechanism that is also used in the standard library, e.g. by the default implementation of random array generation (like in `rand(1:20, 10)`).

#### Generating values from a type

Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced.
In order to define random generation of values of type `T`, the following method can be defined:
`rand(rng::AbstractRNG, ::Random.SamplerType{T})` (this should return what `rand(rng, T)` is expected to return).
Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced. `SamplerType` is the *default sampler for types*. In order to define random generation of values of type `T`, the `rand(rng::AbstractRNG, ::Random.SamplerType{T})` method should be defined, and should return values what `rand(rng, T)` is expected to return.

Let's take the following example: we implement a `Die` type, with a variable number `n` of sides, numbered from `1` to `n`.
We want `rand(Die)` to produce a die with a random number of up to 20 sides (and at least 4):
Let's take the following example: we implement a `Die` type, with a variable number `n` of sides, numbered from `1` to `n`. We want `rand(Die)` to produce a `Die` with a random number of up to 20 sides (and at least 4):

```jldoctest Die
struct Die
Expand Down Expand Up @@ -126,12 +153,11 @@ julia> a = Vector{Die}(undef, 3); rand!(a)
Die(8)
```

#### Generating values from a collection
#### A simple sampler without pre-computed data

Here we define a sampler for a collection. If no pre-computed data is required, it can be implemented with a `SamplerTrivial` sampler, which is in fact the *default fallback for values*.

Given a collection type `S`, it's currently assumed that if `rand(::S)` is defined, an object of type `eltype(S)` will be produced.
In order to define random generation out of objects of type `S`, the following method can be defined:
`rand(rng::AbstractRNG, sp::Random.SamplerTrivial{S})`. Here, `sp` simply wraps an object of type `S`, which can be accessed via `sp[]`.
Continuing the `Die` example, we want now to define `rand(d::Die)` to produce an `Int` corresponding to one of `d`'s sides:
In order to define random generation out of objects of type `S`, the following method should be defined: `rand(rng::AbstractRNG, sp::Random.SamplerTrivial{S})`. Here, `sp` simply wraps an object of type `S`, which can be accessed via `sp[]`. Continuing the `Die` example, we want now to define `rand(d::Die)` to produce an `Int` corresponding to one of `d`'s sides:

```jldoctest Die; setup = :(Random.seed!(1))
julia> Random.rand(rng::AbstractRNG, d::Random.SamplerTrivial{Die}) = rand(rng, 1:d[].nsides);
Expand All @@ -146,38 +172,48 @@ julia> rand(Die(4), 3)
2
```

In the last example, a `Vector{Any}` is produced; the reason is that `eltype(Die) == Any`. The remedy is to define
`Base.eltype(::Type{Die}) = Int`.

Given a collection type `S`, it's currently assumed that if `rand(::S)` is defined, an object of type `eltype(S)` will be produced. In the last example, a `Vector{Any}` is produced; the reason is that `eltype(Die) == Any`. The remedy is to define `Base.eltype(::Type{Die}) = Int`.

#### Generating values for an `AbstractFloat` type
A `SamplerTrivial` does not have to wrap the original object. For example, in `Random`, `AbstractFloat` types are special-cased, because by default random values are not produced in the whole type domain, but rather in `[0,1)`.

`AbstractFloat` types are special-cased, because by default random values are not produced in the whole type domain, but rather
in `[0,1)`. The following method should be implemented for `T <: AbstractFloat`:
`Random.rand(::AbstractRNG, ::Random.SamplerTrivial{Random.CloseOpen01{T}})`
Consequently, a method
```julia
Sampler(::Type{RNG}, ::Type{T}, n::Repetition) where {RNG<:AbstractRNG,T<:AbstractFloat} =
Sampler(RNG, CloseOpen01(T), n)
```
is defined to return `SamplerTrivial` with a `Random.CloseOpen01{T}}` type defined for this purpose, which has an appropriate `rand` method defined for it.

#### An optimized sampler with pre-computed data

#### Optimizing generation with cached computation between calls
Consider a discrete distribution, where numbers `1:n` are drawn with given probabilities that some to one. When many values are needed from this distribution, the fastest method if using an [alias table](https://en.wikipedia.org/wiki/Alias_method). We don't provide the algorithm for building such a table here, but suppose it is available in `make_alias_table(probabilities)` instead, and `draw_number(rng, alias_table)` can be used to draw a random number from it.

When repeatedly generating random values (with the same `rand` parameters), it happens for some types
that the result of a computation is used for each call. In this case, the computation can be decoupled
from actually generating the values. This is the case for example with the default implementation for
`AbstractArray`. Assume that `rand(rng, 1:20)` has to be called repeatedly in a loop: the way to take advantage
of this decoupling is as follows:
Suppose that the distribution is described by
```julia
struct DiscreteDistribution{V <: AbstractVector}
probabilities::V
end
```
and that we *always* want to build an a alias table, regardless of the number of values needed (we learn how to customize this below). The methods
```julia
Random.eltype(::Type{<:DiscreteDistribution}) = Int

function Random.Sampler(::AbstractRng, distribution::DiscreteDistribution, ::Repetition)
SamplerSimple(disribution, make_alias_table(distribution.probabilities))
end
```
should be defined to return a sampler with pre-computed data, then
```julia
rng = MersenneTwister()
sp = Random.Sampler(rng, 1:20) # or Random.Sampler(MersenneTwister,1:20)
for x in X
n = rand(rng, sp) # similar to n = rand(rng, 1:20)
# use n
function rand(rng::AbstractRNG, sp::SamplerSimple{<:DiscreteDistribution})
draw_number(rng, sp.data)
end
```
will be used to draw the values.

#### Custom sampler types

The `SamplerSimple` type is sufficient for most use cases with precomputed data. However, in order to demonstrate how to use custom sampler types, here we implement something similar to `SamplerSimple`.

This mechanism is of course used by the default implementation of random array generation (like in `rand(1:20, 10)`).
In order to implement this decoupling for a custom type, a helper type can be used.
Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so
there is an opportunity for this optimization:
Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so there is an opportunity for this optimization. We call our custom sampler `SamplerDie`.

```julia
import Random: Sampler, rand
Expand All @@ -194,10 +230,9 @@ Sampler(RNG::Type{<:AbstractRNG}, die::Die, r::Random.Repetition) =
rand(rng::AbstractRNG, sp::SamplerDie) = rand(rng, sp.sp)
```

It's now possible to get a sampler with `sp = Sampler(rng, die)`, and use `sp` instead of `die` in any `rand` call involving `rng`.
In the simplistic example above, `die` doesn't need to be stored in `SamplerDie` but this is often the case in practice.
It's now possible to get a sampler with `sp = Sampler(rng, die)`, and use `sp` instead of `die` in any `rand` call involving `rng`. In the simplistic example above, `die` doesn't need to be stored in `SamplerDie` but this is often the case in practice.

This pattern is so frequent that a helper type named `Random.SamplerSimple` is available,
Of course, this pattern is so frequent that the helper type used above, namely `Random.SamplerSimple`, is available,
saving us the definition of `SamplerDie`: we could have implemented our decoupling with:

```julia
Expand Down Expand Up @@ -228,8 +263,7 @@ Sampler(RNG::Type{<:AbstractRNG}, die::Die, ::Val{1}) = SamplerDie1(...)
Sampler(RNG::Type{<:AbstractRNG}, die::Die, ::Val{Inf}) = SamplerDieMany(...)
```

Of course, `rand` must also be defined on those types (i.e. `rand(::AbstractRNG, ::SamplerDie1)`
and `rand(::AbstractRNG, ::SamplerDieMany)`).
Of course, `rand` must also be defined on those types (i.e. `rand(::AbstractRNG, ::SamplerDie1)` and `rand(::AbstractRNG, ::SamplerDieMany)`). Note that, as usual, `SamplerTrivial` and `SamplerSimple` can be used if custom types are not necessary.

Note: `Sampler(rng, x)` is simply a shorthand for `Sampler(rng, x, Val(Inf))`, and
`Random.Repetition` is an alias for `Union{Val{1}, Val{Inf}}`.
Expand Down
54 changes: 36 additions & 18 deletions stdlib/Random/src/Random.jl
Original file line number Diff line number Diff line change
Expand Up @@ -40,22 +40,6 @@ Supertype for random number generators such as [`MersenneTwister`](@ref) and [`R
"""
abstract type AbstractRNG end

"""
Random.gentype(T)
Determine the type of the elements generated by calling `rand([rng], x)`,
where `x::T`, and `x` is not a type.
The definition `gentype(x) = gentype(typeof(x))` is provided for convenience,
and `gentype(T)` defaults to `eltype(T)`.
NOTE: `rand([rng], X)`, where `X` is a type, is always assumed to produce
an object of type `X`.
# Examples
```jldoctest
julia> gentype(1:10)
Int64
```
"""
gentype(::Type{X}) where {X} = eltype(X)
gentype(x) = gentype(typeof(x))

Expand Down Expand Up @@ -137,6 +121,21 @@ const Repetition = Union{Val{1},Val{Inf}}
# Sampler(::AbstractRNG, X, ::Val{Inf}) = Sampler(X)
# Sampler(::AbstractRNG, ::Type{X}, ::Val{Inf}) where {X} = Sampler(X)

"""
Sampler(rng, x, repetition = Val(Inf))
Return a sampler object that can be used to generate random values from `rng` for `x`.
When `sp = Sampler(rng, x, repetition)`, `rand(rng, sp)` will be used to draw random values,
and should be defined accordingly.
`repetition` can be `Val(1)` or `Val(Inf)`, and should be used as a suggestion for deciding
the amount of precomputation, if applicable.
[`Random.SamplerType`](@ref) and [`Random.SamplerTrivial`](@ref) are default fallbacks for
*types* and *values*, respectively. [`Random.SamplerSimple`](@ref) can be used to store
pre-computed values without defining extra types for only this purpose.
"""
Sampler(rng::AbstractRNG, x, r::Repetition=Val(Inf)) = Sampler(typeof(rng), x, r)
Sampler(rng::AbstractRNG, ::Type{X}, r::Repetition=Val(Inf)) where {X} = Sampler(typeof(rng), X, r)

Expand All @@ -149,18 +148,30 @@ Sampler(::Type{RNG}, ::Type{X}) where {RNG<:AbstractRNG,X} = Sampler(RNG, X, Val

#### pre-defined useful Sampler types

# default fall-back for types
"""
SamplerType{T}()
A sampler for types, containing no other information. The default fallback for `Sampler`
when called with types.
"""
struct SamplerType{T} <: Sampler{T} end

Sampler(::Type{<:AbstractRNG}, ::Type{T}, ::Repetition) where {T} = SamplerType{T}()

Base.getindex(::SamplerType{T}) where {T} = T

# default fall-back for values
struct SamplerTrivial{T,E} <: Sampler{E}
self::T
end

"""
SamplerTrivial(x)
Create a sampler that just wraps the given value `x`. This is the default fall-back for
values.
The recommended use case is sampling from values without precomputed data.
"""
SamplerTrivial(x::T) where {T} = SamplerTrivial{T,gentype(T)}(x)

Sampler(::Type{<:AbstractRNG}, x, ::Repetition) = SamplerTrivial(x)
Expand All @@ -173,6 +184,13 @@ struct SamplerSimple{T,S,E} <: Sampler{E}
data::S
end

"""
SamplerSimple(x, data)
Create a sampler that wraps the given value `x` and the `data`.
The recommended use case is sampling from values with precomputed data.
"""
SamplerSimple(x::T, data::S) where {T,S} = SamplerSimple{T,S,gentype(T)}(x, data)

Base.getindex(sp::SamplerSimple) = sp.self
Expand Down

0 comments on commit e813f0d

Please sign in to comment.