-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
split random.jl into a new random/ directory and clean-up #22752
Conversation
bc9567a
to
340d263
Compare
I would like to request review on this one, as it will be difficult to rebase when other changes get merged, and I would like know where to start further work on |
Would it not be possible to clean up in sections within the same file? Now this is split into ~10 files with ~150 lines of code each, which is not very typical for base, is it? Look at e.g. the HigherOrderFns module for a quite long file, which still is easy to follow with the descriptive header up top and the rest of the file is clearly separated into sections. Just a thought... 🇸🇪 |
Other functionalities are split into directories rooted in "base/", with more or less big files, a good example with small files is "base/markdown/". Well, splitting is not strictly necessary, but I can't count the number of times I had to scroll inefficiently through "random.jl", to find relevant part of the code; admitedly I'm not good at code browsing, but this splitting would make my life easier, so why not! (of course if there aren't drawbacks I didn't foresee). |
I think this is for the better. You can always argue where the line should be drawn but since you the person who is most familiar with this code, I'd go with your proposal. |
And now for anyone who didn't write this code, they need to scroll through multiple files instead of searching one file. |
@yuyichao I don't understand your comment. Having a code base split in mutiple files has some advantages. I'm simply proposing to draw the line closer to having smaller files. By saying "make my life easier", I acknowledge my selfish motivation: as I work for free, I prefer to optimize my time. But by no means I was implying "and make the life of others more difficult". I think it makes it easier for anyone who works on a |
Those advantages are mainly for the case where each file are relatively self contained and have limited and relatively well defined interface (i.e. the functions calls and name references between files). When you have strong coupling between different part, it's not necessarily a good idea to split them. With our codebase as an example, I've also personally spent a lot of time figuring out if a codegen helper function is defined in And that is exactly what I mean is that this does make it much harder for others to browse the code. I've personally searched through this file checking the calling between a few internal functions but now they are usually not used in the same file they are defined which makes them much harder to read. The splitting is also not super clear for readers that didn't write all the code. As an example, there are functions that deal with |
OK thanks for the clarification. I updated the OP to include the logic of file organization, including a proposition of a simpler scheme (i.e. less files). As I said, suggestions are welcome. I experienced also not understanding well the organisation of some files in base and feeling it's not optimal. But here I precisely tried to clarify the dependency of functionalities and reduce "coupling" (if I understand correctly the word "coupling"), there should be no coupling between different files I created, only on-way dependencies.
For example, some functions are defined in dSFMT.jl, but only used in random.jl: the first file defines the low-level API, the second one uses this low-level to define a higher-level API. This looks like trivial code organization. By your argument they should be merged? I would say this is subjective. Here I split this higher-level into a medium-level |
340d263
to
1d9c617
Compare
Rebased, add re-organized the functionalities into fewer file, cf. precisions at the bottom of OP. I'm pretty happy with the current state, I plan to merge within a couple of days if no objection. |
1d9c617
to
3ae298f
Compare
|
||
rand(rng::RandomDevice, ::Type{CloseOpen}) = rand(rng, Close1Open2) - 1.0 | ||
|
||
@inline rand(r::RandomDevice, T::BitFloatType) = rand_generic(r, T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did this exist before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is one of the very few "functional" change; I introduced it to be DRY and to satisfy the last code re-organization: this method was previously defined for Union{RandomDevice,MersenneTwister}
, but now I put the code concerning specific RNGs in their respective section. Also, eventually the plan is to allow users to define easily rand
methods in terms of others, so this is one small first step in this direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be preferable to do the code rearrangement and functional changes separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I thought it should be harmless (non-breaking), but will amend. Other changes are: removing of the win32_SystemFunction036!
(incorporated in rand!(::MersenneTwister)
), and removing of the never-used never-exported globalRNG()
function: do you want me to also split that out in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those changes only appear in a single place each, right? was easier to see what was going on with those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Is it fine for you if I split the introduction of rand_generic
in a separate commit but in the same PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe if it were the last commit instead of partway through, since some of the last commits reorganized things that the first few commits changed so the total lines of code changed commit by commit are quite a bit more than the overall lines of code changed when viewing them combined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rand_generic
was introduced in the last commit, so I could easily split it out in a new last commit. I can also easily merge the current last 2 commits ("merge/rename some files). Would it be good enough?
I really like this. |
FWIW, that still doesn't change the fact that you need to go through each file to find where a function/method is defined.
No dSFMT.jl is totally different. It's a completely different module and the most functions has dsfmt in it's name. |
Sorry, I still don't understand you. Base is full of cases where different methods of the same function are defined in different files, this is a necessary trade-off. Here, with the current state of the PR, almost only the
I gave this example because you complained by saying:
That's exactly what happens with dSFMT.jl, which defines functions which are used in the random.jl (in master), so your "totally different" is exaggerated. |
base/random/RNGs.jl
Outdated
##### GLOBAL_RNG | ||
|
||
function dsfmt_gv_srand() | ||
# Temporary fix for #8874 and #9124: update global RNG for Rmath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess this comment probably isn't relevant any more (but can be dealt with another time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cf. #19980, which removes this fix, but whose merge preconditions are not satisfied.
base/random/generation.jl
Outdated
rand(r::AbstractRNG, T::Type, dims::Dims) = rand!(r, Array{T}(dims)) | ||
rand( T::Type, dims::Dims) = rand(GLOBAL_RNG, T, dims) | ||
rand(r::AbstractRNG, T::Type, d1::Integer, dims::Integer...) = rand(r, T, Dims((d1, dims...))) | ||
rand( T::Type, d1::Integer, dims::Integer...) = rand(T, Dims((d1, dims...))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hopefully equivalent, but any reason in particular for changing this from the previous rand(T::Type, d1::Integer, dims::Integer...) = rand(T, tuple(Int(d1), convert(Dims, dims)...))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make it fit on one line, although it's still 94 chars for the longest line here (I considered renaming d1
to d
to get 92 chars...).
base/random/RNGs.jl
Outdated
#### random integers | ||
|
||
@inline rand(r::MersenneTwister, | ||
::Type{T}) where {T<:Union{Bool,Int8,UInt8,Int16,UInt16,Int32,UInt32}} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this formatting looks pretty strange - isn't there a shorthand name for this union somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a shorthand for this one. For the formatting, it's standard to align vertically arguments names, but I agree the where
there looks strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you're right I thought this was BitIntegerSmall but looks like that doesn't include (U)Int32 on 32 bit, or Bool. I'd maybe do something like
@inline (rand(r::MersenneTwister, ::Type{T}) where
{T<:Union{Bool,Int8,UInt8,Int16,UInt16,Int32,UInt32}}) = rand_ui52_raw(r) % T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly prefer the way it is now than this alternative...
There are many differences between the two cases. For different files in Base, they are generally seperated by namespaces or the types they are operating on. Coupling do exist but are generally done through standard interfaces (convert, indexing etc). This makes it easy to guess where a function is defined in based on their type/name/module most of the time without having to open the file they are defined in and read the description in that file. It also means that the functions defined in other files/modules don't generally affect reading the code since they are usually not the most important part/are just standard exported API with well understood behavior. None of these are the case for
Not really, they are internal base functions but are basically dSFMT.jl API and even though they are not public API, they are well namespaced which makes a huge difference for "searchability" and the |
Well, some amount of work is necessary for understanding the organization of many functions/files in base; comparatively, the change here doesn't require a particular amount of work from a casual reader IMO (depend of course on what the reader's task is). And for someone familiar with the random code base, I think this change improves things (a lot, as far as I'm concerned). Anyway, if you see specific coupling between those new files, please report them. AFAICT, there are none (i.e. no file depends on the "internal workings" (expression taken from wikipedia on coupling) of another file). |
Yes but irrelevant. No or minimum amount of work should be required to navigate through the files.
And as someone who follows the function calls in this file from time to time, I can also say that being able to find all functions (that doesn't obviously belong to a different namespace) in the same file helps a lot. If you want a good example of what's a splitting of files that doesn't require any effort to guess where the function is defined, you can have a look at
Ignore the word "coupling" if you want. What I mean is "calling internal functions that are not clearly namespaced or separated by input types from a different file". This is (the other) half of the problem and examples includes,
|
I was giving context, so it's certainly not irrelevant. Please show a bit more courtesy. I don't disagree with your ideal, and I tried to improve things on this front.
I agree, for some tasks; but it doesn't mean having everything in the same file (in random.jl in this case) is better overall.
Regarding We fail to understand each other, and I start to feel uneasy with this discussion as I don't see it as so useful in this public forum. I will leave to others discussing with you the merits/disadvantages of this approach. |
rand(rng, RangeGenerator(r)) | ||
|
||
rand!(rng::AbstractRNG, A::AbstractArray, | ||
r::UnitRange{<:Union{Signed,Unsigned,BigInt,Bool,Char}}) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does rand!
support UnitRange{Char}
but rand
doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, UnitRange{Char}
cannot be constructed so this Char
can be removed from this Union
, which was on my todo list already.
This is a leftover from an old commit following Char
not being an Integer
anymore (cf. #13600).
base/random/normal.jl
Outdated
yy = -log(rand(rng)) | ||
yy+yy > xx*xx && return (rabs >> 8) % Bool ? | ||
-ziggurat_nor_r - xx : | ||
ziggurat_nor_r + xx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would probably be clearer written out longhand as
if yy+yy > xx*xx
if (rabs >> 8) % Bool
return -ziggurat_nor_r - xx
else
return ziggurat_nor_r + xx
end
end
Very good argument for not doing it in the first place. There isn't a super clear cut between the files and yes, when functions like these are added, it's an evidence that the file should be merged or organized differently.
So yes, for this particular case (which is still only the internal function part of the issue) I'll be perfectly fine with it if all of them are defined in a file named |
There's a net increase of 127 lines. Is that from headers and wrapping or does this add any code? |
If I count non-blank/non-comment lines, then there is a net increase of only 26 lines. In the commit wrapping lines at 92 chars shows a net increase of 64 lines (including few comments), few of which have been reverted subsequently (by compressing a couple of functions). So without counting in detail, this confirms that basically no code has been added. |
I'd squash together any commit that partially undoes something or makes significant revisions to the same file that was touched in a previous commit. Breaking the commits apart into contiguous sections of code that are moved unmodified vs style revisions would also make sense. |
FWIW, I have always wanted to move these away into a directory and some reorganization. |
I'm all for this reorganization and don't understand what all the fuss is about. Sure, we can leave all files as-is forever since the language doesn't impose a particular file structure. But occasionally you want to restructure things better and more logically – and who better to reorganize the RNG code than @rfourquet? Please carry on! Group the commits logically as you see fit and merge when ready. |
In support of thoughtful organization into multiple files, judiciously splitting Edit:
💯 |
3ae298f
to
6a6ba6b
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
6a6ba6b
to
d963fbe
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
|
||
function DSFMT_state(val::Vector{Int32} = zeros(Int32, JN32)) | ||
length(val) == JN32 || | ||
throw(DomainError(length(val), "Expected length: $JN32.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're just moving this, but seems odd that this isn't a DimensionMismatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will put this on my todo list.
base/random/normal.jl
Outdated
Fill the array `A` with random numbers following the exponential distribution | ||
(with scale 1). | ||
|
||
# Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you seem to have lost the formatting change to
# Examples
```jldoctest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oups sorry. I don't know if this come from the rebase... will re-check everything carefully. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I rebased again by redoing all the changes manually commit by commit to double-check. I think this should be good to go now.
base/random/normal.jl
Outdated
julia> x1 = rand(rng, 2) | ||
2-element Array{Float64,1}: | ||
0.590845 | ||
0.766797 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline between # Examples
and jldoctest
shouldn't be added back here
base/random/normal.jl
Outdated
julia> x1 = rand(2) | ||
2-element Array{Float64,1}: | ||
0.590845 | ||
0.766797 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline between # Examples
and jldoctest
shouldn't be added back here
base/random/normal.jl
Outdated
julia> rand(Int, 2) | ||
2-element Array{Int64,1}: | ||
1339893410598768192 | ||
1575814717733606317 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or here
d963fbe
to
8c61bea
Compare
8c61bea
to
af44fe5
Compare
The one CI failure is a timeout (travis). |
"base/random.jl" has slowly grown into a slightly messy file, with which it's sometimes difficult to work because of its size. The reorganization in this PR is mainly to make my life easier (in particular with a refactoring I was planning).
The first commit is the initial split of random.jl into multiple files: it's not fine-grained, as I tried to keep big blocks of code together to help reviewing. Subsequent commits reorganize things at a finer level. I'm open to suggestions for filenames (in particular I'm not conviced with the name of "sequences.jl" containing
randsubseq
,randperm
etc.) and structure.There should be no functional change, except for the deletion of 2 unexported functions:
win32_SystemFunction036!
, which was incorporated intorand!(::RandomDevice)
(I could not test this change, but appveyor will tell), andglobalRNG
which I had intended to export in an old PR, but this was decided against because of the prospect of multithreaded RNGs.EDIT: to clarify, here is the new file list with their dependency (probably incomplete), indicated within square bracket. Also, most files depend on 2), and on 4) for
GLOBAL_RNG
.(cf. "EDIT2" below for the current re-organization)
rand
/rand!
RandomDevice
andMersenneTwister
srand
related functions and definition of GLOBAL_RNG (which depends onsrand
)rand
for scalar types (rand(::Type{T})
)MersenneTwister
, plusbitrand
rand
from ranges (rand(1:n)
)rand
from collections (rand(::Set))
)randn
&randexp
randperm
,randsubseq
, etc[5] and [6] could be merged, similarly to [7] and [8] which contain both scalar and array versions.
Alternative ideas for organization could be (if we want less files): put all
MersenneTwister
specializations from [4,5,6] in [3], merge what remains from [5,6] together with [7,8] into "generation.jl" or some better name, keep [9] as is, and merge [10,11] together withbitrand
andrandstring
into "misc.jl".EDIT2: implemented the last plan above (merging some files):
RandomDevice
andMersenneTwister
, and specialized random generation, and seedingAbstractRNG
(parts of [5, 6, 7, 8] above)randstring
,bitrand
, uuid,randperm
etc.