Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework replace and replace! #26206

Merged
merged 4 commits into from
Apr 13, 2018
Merged

Rework replace and replace! #26206

merged 4 commits into from
Apr 13, 2018

Conversation

nalimilan
Copy link
Member

Introduce a new replace!(new::Callable, res::T, A::T, count::Union{Nothing,Int}) method which custom types can implement to support all replace and replace! methods automatically, instead of the current replace!(new::Callable, A::T, count::Int). This offers several advantages:

  • For arrays, instead of copying the input and then replace elements, we can do the copy and replace operations at the same time, which is quite faster for arrays when count=nothing.
  • For dicts and sets, copying up-front is still faster as long as most original elements are preserved, but for replace, we can apply replacements directly instead of storing a them in a temporary vector.
  • When the LHS of a pair contains a singleton type, we can subtract it from the element type
    of the result, e.g. Union{T,Missing} becomes T.

Also simplify the dispatch logic by removing the internal _replace! method in favor of replace!.

Benchmarks indicate that this PR significantly improves performance for arrays (except for Int8, for which it's stable), but not so much for sets (except for Int8, which are faster for some reason). So the specialized code for sets/dicts when origin and destination are different might not be worth adding.

A = rand(Int, 10_000);
B = rand(1:10, 10_000);
C = rand(Int8, 10_000);
@btime replace(A, 1 => 0);
@btime replace(B, 1 => 0);
@btime replace(C, 1 => Int8(0));
@btime replace(A, 1 => 0, count=500);
@btime replace(B, 1 => 0, count=500);
@btime replace(C, 1 => Int8(0), count=500);

sA = Set(A);
sB = Set(B);
sC = Set(C);
@btime replace(x -> ifelse(isodd(x), x-1, x), sA);
@btime replace(x -> ifelse(isodd(x), x-1, x), sB);
@btime replace(x -> ifelse(isodd(x), x-1, x), sC);

# Before
julia> @btime replace(A, 1 => 0);
  27.696 μs (3 allocations: 78.23 KiB)

julia> @btime replace(B, 1 => 0);
  36.078 μs (3 allocations: 78.23 KiB)

julia> @btime replace(C, 1 => Int8(0));
  17.438 μs (2 allocations: 9.97 KiB)

julia> @btime replace(A, 1 => 0, count=500);
  27.600 μs (4 allocations: 78.25 KiB)

julia> @btime replace(B, 1 => 0, count=500);
  20.220 μs (4 allocations: 78.25 KiB)

julia> @btime replace(C, 1 => Int8(0), count=500);
  17.422 μs (3 allocations: 9.98 KiB)

julia> @btime replace(x -> ifelse(isodd(x), x-1, x), sA);
  645.355 μs (4889 allocations: 477.13 KiB)

julia> @btime replace(x -> ifelse(isodd(x), x-1, x), sB);
  22.788 μs (18 allocations: 144.81 KiB)

julia> @btime replace(x -> ifelse(isodd(x), x-1, x), sC);
  22.137 μs (207 allocations: 8.22 KiB)

# After
julia> @btime replace(A, 1 => 0);
  16.180 μs (3 allocations: 78.23 KiB)

julia> @btime replace(B, 1 => 0);
  23.398 μs (3 allocations: 78.23 KiB)

julia> @btime replace(C, 1 => Int8(0));
  16.410 μs (2 allocations: 9.97 KiB)

julia> @btime replace(A, 1 => 0, count=500);
  17.062 μs (4 allocations: 78.25 KiB)

julia> @btime replace(B, 1 => 0, count=500);
  15.328 μs (4 allocations: 78.25 KiB)

julia> @btime replace(C, 1 => Int8(0), count=500);
  18.733 μs (3 allocations: 9.98 KiB)

julia> @btime replace(x -> ifelse(isodd(x), x-1, x), sA);
  633.428 μs (4798 allocations: 219.25 KiB)

julia> @btime replace(x -> ifelse(isodd(x), x-1, x), sB);
  22.580 μs (12 allocations: 144.47 KiB)

julia> @btime replace(x -> ifelse(isodd(x), x-1, x), sC);
  10.096 μs (69 allocations: 3.45 KiB)

Introduce a new replace!(new::Callable, res::T, A::T, count::Union{Nothing,Int}) method
which custom types can implement to support all replace and replace! methods automatically,
instead of the current replace!(new::Callable, A::T, count::Int). This offers several advantages:
- For arrays, instead of copying the input and then replace elements, we can do the copy and replace
operations at the same time, which is quite faster for arrays when count=nothing.
- For dicts and sets, copying up-front is still faster as long as most original elements are preserved,
but for replace(), we can apply replacements directly instead of storing a them in a temporary vector.
- When the LHS of a pair contains a singleton type, we can subtract it from the element type
of the result, e.g. Union{T,Missing} becomes T.

Also simplify the dispatch logic by removing the internal _replace! method in favor of replace!.
@nalimilan
Copy link
Member Author

Anybody willing to review this?

@rfourquet
Copy link
Member

I will try to review within few days

@nalimilan
Copy link
Member Author

@rfourquet Any chance you could find the time to review this?

Copy link
Member

@rfourquet rfourquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work, thanks! My main questions in comments concern 1) why use nothing for count and 2) whether to name internal methods _replace! or replace!.

base/set.jl Outdated

function _replace!(A, count::Integer, old_new::Tuple{Vararg{Pair}})
function replace!(res, A, count::Union{Integer,Nothing}, old_new::Tuple{Vararg{Pair}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not part of the API, so (as in the the other comment) I prefer it to not be a method of replace! (example it also clobbers the output of mehods).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the problem is that it's not great to tell people to override _replace! to implement the replace/replace! methods. I actually wonder whether we should make this replace! method officially public, or at least document it under an # Implementation part in the docstring.

An alternative is to find a better name than _replace!.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to make the method public. But the one above (replace!(res, A, count::Union{Integer,Nothing}, old_new::Tuple{Vararg{Pair}})) is really an internal method which doesn't have to be overriden, so _replace! is a better name for it I think.

base/set.jl Outdated
@@ -600,16 +606,17 @@ julia> replace!(Set([1, 2, 3]), 1=>0)
Set([0, 2, 3])
```
"""
replace!(A, old_new::Pair...; count::Integer=typemax(Int)) = _replace!(A, count, old_new)
replace!(A, old_new::Pair...; count::Union{Integer,Nothing}=nothing) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on why you need to introduce nothing here: it seems to make the code slightly more complex, and it's not part of the API (at least you didn't document it)... why not keep typemax(Int) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of nothing is that it has a different type, which allows the compiler to optimize out instructions which are not needed (like the n < count check, which introduces a branch an prevents SIMD). Also, for replace, it allows subtracting singleton types when count=nothing while keeping the function type-stable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks. I'm a little sad to see this added complexity, but I guess losing type stabity in particular would be a deal breaker.

base/set.jl Outdated
end

promote_valuetype(x::Pair{K, V}) where {K, V} = V
promote_valuetype(x::Pair{K, V}, y::Pair...) where {K, V} =
promote_type(V, promote_valuetype(y...))

# Subtract singleton types which are going to be replaced
@pure issingletontype(::Type{T}) where {T} = isdefined(T, :instance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not Jameson so I can't comment on the use of @pure here! ;-)

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @pure is OK here, although the definition itself is wrong:

@pure issingletontype(T::DataType) = isdefined(T, :instance)
issingletontype(::Type) = false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, I used @pure correctly! Thanks for the pointer about DataType.

base/set.jl Outdated
# Subtract singleton types which are going to be replaced
@pure issingletontype(::Type{T}) where {T} = isdefined(T, :instance)
function subtract_singletontype(::Type{T}, x::Pair{K}) where {T, K}
if issingletontype(K) # singleton type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment may seem slightly redundant :)

base/set.jl Outdated
if x !== y
push!(repl, x => y)
c += 1
if res === A
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth a small comment mentioning the optimization that can't be done in general when res !== x.

base/set.jl Outdated
# to make replace/replace! work for a new container type Cont, only
# replace!(new::Callable, A::Cont; count::Integer=typemax(Int))
# replace!(new::Callable, res::Cont, A::Cont; count::Integer=typemax(Int))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method of replace! is a nice addition to the API, but you didn't document it... I understand that it is felt that this function has already a too large API, so we could leave it undocumented for now. But I'm uncomfortable making it a method of replace! then, as people will discover it and start to use it, so I'd rather have it be named _replace! to make it clear that it's an internal method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you should update the type and default value for count.

@rfourquet
Copy link
Member

the specialized code for sets/dicts when origin and destination are different might not be worth adding.

I would say, now that it's implemented, we can keep it. Maybe for other types of Abstract{Set,Dict} it can make a substantial difference.

@rfourquet
Copy link
Member

I see that in you benchmark code, you didn't interpolate the global variables A, B, sA and sB, could you check if it makes a difference in this case?

@nalimilan
Copy link
Member Author

I see that in you benchmark code, you didn't interpolate the global variables A, B, sA and sB, could you check if it makes a difference in this case?

I've just checked, and it doesn't make any difference in this case. AFAIK that's because everything happens inside the function (which gets specialized), and the cost of one dynamic dispatch is negligible compared with the time it takes to run.

base/set.jl Outdated
count::Integer=typemax(Int))
count < 0 && throw(DomainError(count, "`count` must not be negative"))
count != 0 && _replace!(new, A, min(count, typemax(Int)) % Int)
count::Union{Integer,Nothing}=nothing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are using the "new API" to implement the functionality for certain types, so it's fine to use replace!, but then again I think we should keep calling _replace! the specific implementation methods, which have the signature _replace!(new, res, A, count), which really is an implementation detail and is not part of any API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with using _replace! is that custom types would have either to override _replace! (which is unexported and doesn't have a very appealing name), or to reimplement all replace and replace!methods directly. Here that implies duplicating some boilerplate (with some tricks, like checking count === nothing and min(count, typemax(Int)) % Int), which one can easily get wrong). In other places, it even implies duplicating the call to subtract_singletontype, which is an unexported function and which people would better not have to bother with at all.

So I think it's much better to recommend package authors to implement a single low-level method, be it replace! or another unexported method (ideally with a better name than _replace!).

@rfourquet
Copy link
Member

I worked on this and will push a commit to feed the discussion (I'm not yet fully happy with it).

I'm still reluctant at calling replace! a method which has an implicit precondition (like count >= 0), i.e. which is not meant to be used by end-users.

One thing I noticed also is that with the new method, there is an ambiguity in a call like replace!(x->2x, [1], [2]): the replace!(pred, A, new) is called... so we should either get rid of this one, or find a new name for the one taking both res and A. I would now favor not making it public for now, as it would not be available yet for other forms of replace!, i.e. the ones with pred and pairs; I called it replaceimpl! temporarily in the commit I will push.

Also, Nothing is needed for type-stability only when the replacement is specified with pairs, for replace. What do you think that we limit the use of a count=nothing default only for this method; then when calling the underlying replace! method, we pass count=coalesce(count, typemax(Int)). For the two other advantages of using nothing that you mentioned: maybe constant propagation can work? and SIMD could still be worked out with a more complicated code for the implementation for vectors. For me it would be best if people have simply to implement replaceimpl!(new, res, A, count::Integer=typemax(Int)); it just needs a test for count <= 0; it doesn't prevent them from re-using the already implemented code when their type is Union{AbstractArray,AbstractDict,AbstractSet}, by implementing directly the _replace! method, like Base does. Alternatively, (which I do in my commit, and apparently doesn't alter the performance of your version, except for the set sA, not sure yet why), the default implementation replaceimpl! itself transforms typemax(Int) to nothing for the cases where it matters.

@rfourquet
Copy link
Member

My pushed commit may not seem to improve so much, but I guess my main points are: 1) don't call replace! a non-API method, as it can confuse the user and it makes it more clear also for the developer when reading the code (at least in my view), and 2) if nothing is useful for performance, use it only where necessary, but don't pollute all the methods (and the API) with it.

@rfourquet rfourquet closed this Mar 23, 2018
@rfourquet rfourquet reopened this Mar 23, 2018
@nalimilan
Copy link
Member Author

Actually, count=nothing isn't strictly needed for SIMD, as we can simply check count >= length(A) and use two different loops (sometimes LLVM can do this automatically via loop unswitching, but here it doesn't seem to work; not such about constant propagation). So I've removed all uses of count=nothing except for the replace method with pairs. I've also added a check_count method which takes care of converting to Int and throwing an error if negative. All of this allowed getting rid of replaceimpl, which I find easy to confuse with _replace! (the jungle of methods is already hard to grasp). Finally, I've renamed the _replace! method taking pairs to replace_pairs, since it's really different from others.

Let's say we keep _replace! for now, meaning that there are no API stability guarantees regarding how to implement replace and replace! (other than by implementing the public methods directly).

@rfourquet
Copy link
Member

Looks good! The use of nothing in the implementation for arrays seemed to allow a shorter code but I prefer your new version. I can review more closely tomorrow, but I think this will be good to go.

@nalimilan
Copy link
Member Author

OK, cool!

@StefanKarpinski Could you just have a quick look at the singleton stuff and say whether you agree on the principle? That's the only part of the PR which affects public API (though in a subtle way).

Copy link
Member

@rfourquet rfourquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I wonder if the special treatment of singleton types for replace with pairs desserves a mention in the documentation?

@@ -631,7 +643,7 @@ julia> replace!(isodd, A, 0, count=2)
```
"""
replace!(pred::Callable, A, new; count::Integer=typemax(Int)) =
replace!(x -> ifelse(pred(x), new, x), A, count=count)
replace!(x -> ifelse(pred(x), new, x), A, count=check_count(count))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replace! method already checks count, so not really needed to use check_count here (or did you mean to save one specialization when count is not an Int ?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added it to avoid the specialization and to throw the error as early as possible. But maybe that's not worth it...

base/set.jl Outdated
@@ -686,16 +694,34 @@ julia> replace([1, 2, 1, 3], 1=>0, 2=>4, count=2)
3
```
"""
function replace(A, old_new::Pair...; count::Integer=typemax(Int))
function replace(A, old_new::Pair...; count::Union{Integer,Nothing}=nothing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of Nothing here for type stability is quite subtle; if you re-push to this branch for another change, may be worth a small comment to explain this.

c == count && break
end
for oldnew in repl
pop!(res, askey(first(oldnew), res))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comment: as res === A, working with only one of A or res may make it slightly easier to read.

@rfourquet
Copy link
Member

Would be good indeed if someone comments on the "singleton stuff", basically that replace([1, missing], missing=>2) isa Vector{Int} (when count is not specified).

@nalimilan
Copy link
Member Author

Travis failure on Mac is unrelated. I'll merge tomorrow barring objections.

@nalimilan nalimilan merged commit 85c341c into master Apr 13, 2018
@nalimilan nalimilan deleted the nl/replace branch April 13, 2018 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants