Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand documentation of custom random samplers. #31787

Merged

Conversation

tpapp
Copy link
Contributor

@tpapp tpapp commented Apr 21, 2019

Following discussion on Discourse with @rfourquet, this PR expands the documentation for defining custom random samplers. Motivation is provided for the design, followed by examples that gradually introduce features and explain why/when they should be used.

No text was removed, but some parts were moved to other subsections.

Docstrings added for relevant methods and structures, and included in the docs.

The use of pre-existing types is emphasized.

Following discussion on Discourse, this PR expands the documentation
for defining custom random samplers. Motivation is provided for the
design, followed by examples that gradually introduce features and
explain why/when they should be used.

No text was removed, but some parts were moved to other subsections.

Docstrings added for relevant methods and structures, and included in
the docs.

The use of pre-existing types is emphasized.
@tpapp
Copy link
Contributor Author

tpapp commented Apr 21, 2019

One thing that was unclear while I was doing this PR is whether to replace references to eltype by Random.gentype, eg in the docstring of rand! and the Die example. I can easily do this, if someone would clarify the preferred approach for me.

@fredrikekre fredrikekre added domain:docs This change adds or pertains to documentation domain:randomness Random number generation and the Random stdlib labels Apr 21, 2019
@tpapp
Copy link
Contributor Author

tpapp commented Apr 26, 2019

@rfourquet: friendly ping, could you please review at some point? I was hoping this would make it to 1.2.

@StefanKarpinski
Copy link
Sponsor Member

We can definitely add docs in an RC.

@rfourquet
Copy link
Member

@tpapp I'm sorry this takes so long, the improvement here is amazing. I combined being on hollidays and sick, and couldn't find a moment to properly review. I should be able to manage by the end of the week.

@tpapp
Copy link
Contributor Author

tpapp commented Apr 30, 2019

@rfourquet: Thanks for getting back to me. No worries, take your time and get well soon!

Copy link
Member

@rfourquet rfourquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent improvement, thank you!

stdlib/Random/docs/src/index.md Outdated Show resolved Hide resolved
stdlib/Random/docs/src/index.md Outdated Show resolved Hide resolved
stdlib/Random/docs/src/index.md Outdated Show resolved Hide resolved
stdlib/Random/docs/src/index.md Show resolved Hide resolved
stdlib/Random/docs/src/index.md Show resolved Hide resolved
stdlib/Random/docs/src/index.md Outdated Show resolved Hide resolved
In order to implement this decoupling for a custom type, a helper type can be used.
Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so
there is an opportunity for this optimization:
In order to implement this decoupling for a custom type, a custom helper type can be used. Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so there is an opportunity for this optimization:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should change "In order to implement this decoupling for a custom type", as you already showed how to do the decoupling for a custom type... maybe add "in cases where SamplerSimple is not suitable? (but the problem then is that we explain SamplerSimple was suitable in our Die example!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to clarify this, I hope the improved wording is clear.

stdlib/Random/docs/src/index.md Outdated Show resolved Hide resolved
stdlib/Random/src/Random.jl Outdated Show resolved Hide resolved
stdlib/Random/src/Random.jl Outdated Show resolved Hide resolved
@rfourquet
Copy link
Member

One thing that was unclear while I was doing this PR is whether to replace references to eltype by Random.gentype

You can have a look at the discussion #27756 for background. Basically, people were not super motivated to introduce gentype, as it was seen mostly as a duplicate of eltype. So I kept it as an implementation detail till now. I'm still in favor to make it into the official API, but I think this should get a bit more support. A motivation is for some distributions which are not in Random (which defines only rand for types and collections), e.g. Normal(), or Samplers: these objects are clearly not collections, such that defining eltype on them seems inadequate (hence the introduction of gentype). Your DiscreteDistribution is also such an example.

- clarify role of `gentype`

- use `sp` as an argument name for samplers

- corrections for docstrings

- clarify pedagogical motivation for custom samplers not using
  existing types

- various minor text corrections: spaces, emphases
@tpapp
Copy link
Contributor Author

tpapp commented May 5, 2019

@rfourquet: Thanks for the review, I implemented the changes you requested.

Regarding

introduce gentype, as it was seen mostly as a duplicate of eltype. So I kept it as an implementation detail till now. I'm still in favor to make it into the official API, but I think this should get a bit more support.

I think that making gentype part of the API is the right decision, but I leave this for a future PR. Should I open an issue as a reminder?

@tpapp
Copy link
Contributor Author

tpapp commented May 8, 2019

Friendly bump: please let me know if I should do anything else to get this merged.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented May 8, 2019

@rfourquet, please merge whenever it looks good to you. Doesn't have to be perfect, we can always improve after the fact.

stdlib/Random/docs/src/index.md Outdated Show resolved Hide resolved

3. `SamplerSimple(self, data)` also contains the additional `data` field, which can be used to store arbitrary pre-computed values.

In addition, [`Random.gentype`](@ref), which falls back to [`eltype`](@ref) is used for computing element types for containers. A method should be defined in case `eltype` is not defined, or a different element type is desired.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused here: you said

making gentype part of the API is the right decision, but I leave this for a future PR

with which I agree, but you document it now, making it public API...

If we keep a reference to gentype here in this document, we would need the approval of a core dev... and a rework of this sentence which is still not fully satisfying: the raison-d'être of gentype is to handle non-collections: collections have already eltype defined by definition, so saying gentype is used for computing element types for containers reinforces the idea that gentype is redundant. Moreover, the second sentence suggests that its fine to have eltype != gentype, which I don't think we should encourage... actually, I may even tend to believe this should not be allowed, i.e. either eltype or gentype should be defined, but not both... which is a point in favor of removing gentype altogether, while extending the scope of eltype (so that it's applicable beyond collections).

Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced.
In order to define random generation of values of type `T`, the following method can be defined:
`rand(rng::AbstractRNG, ::Random.SamplerType{T})` (this should return what `rand(rng, T)` is expected to return).
Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced. `SamplerType` is the *default sampler for types*. In order to define random generation of values of type `T`, the `rand(rng::AbstractRNG, ::Random.SamplerType{T})` method should be defined, and should return values what `rand(rng, T)` is expected to return. `Random.gentype` should *not* need to be defined explicitly for this case, as the type is `T`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep a reference to gentype here, I suggest something like:

"Note that in this case, Random.gentype must not be defined (generated values are of type T).

stdlib/Random/docs/src/index.md Outdated Show resolved Hide resolved
@rfourquet
Copy link
Member

@tpapp thanks for the updates, I added a few more comments. But you seem to have missed few of my comments from the first batch?

Concerning gentype: you offered to open an issue, and I think it's a good idea. You can also include content from a comment I just made inline in this PR, or I can add it to your issue.

@tpapp
Copy link
Contributor Author

tpapp commented May 8, 2019

@rfourquet, thanks for the thorough second review. I addressed all your questions. In particular, I decided to remove all references to Random.gentype so that this can be merged.

In agreement with @StefanKarpinski, I would prefer to get this PR merged, with the understanding that it can always be improved later. Regardless of what happens to gentype, I think this is already an improvement.

@tpapp tpapp mentioned this pull request May 8, 2019
2 tasks
@@ -198,8 +195,6 @@ end
```
and that we *always* want to build an a alias table, regardless of the number of values needed (we learn how to customize this below). The methods
```julia
Random.gentype(::Type{<:DiscreteDistribution}) = Int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it may be better to replace gentype by eltype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, adding back.

@rfourquet
Copy link
Member

Looks better for now with undocumenting gentype. But please check for the (very) few comments you didn't respond to, there is at least one which is important (unqualifed probabilities in make_alias_table(probabilities))

@tpapp
Copy link
Contributor Author

tpapp commented May 8, 2019

@rfourquet: I don't see a comment about the unqualified probabilities, but I corrected it; also added that eltype for the custom distribution.

This was making Documenter fail.
@tpapp
Copy link
Contributor Author

tpapp commented May 9, 2019

I fixed the docstring errors (by removing all remaining references to Random.gentype). The remaining CI error seems unrelated.

Unless there is anything else to address, it would be great to finally merge this.

@fredrikekre fredrikekre merged commit e813f0d into JuliaLang:master May 9, 2019
@tpapp tpapp deleted the tp/random-custom-types-explanations branch May 9, 2019 14:06
@rfourquet
Copy link
Member

I don't see a comment about the unqualified probabilities

It's there, along with 4 other comments. Check for "5 hidden conversations" when you reload the page.

Unless there is anything else to address, it would be great to finally merge this.

I didn't miss the fact that you wished this to be merged without unnecessary delay, I reviewed this PR at the pace I could. There is something left to address (cf. the "4 comments" mentioned above), but unfortunately it's already merged.

@rfourquet
Copy link
Member

Anyway, we can indeed fix that later, thanks for your contribution.

@tpapp
Copy link
Contributor Author

tpapp commented May 10, 2019

@rfourquet: thanks for pointing these out, I will fix them when I make a PR for #31968.

tpapp added a commit to tpapp/julia that referenced this pull request May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:docs This change adds or pertains to documentation domain:randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants