Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster blockdiag for uniform input value-index types #36013

Merged
merged 7 commits into from
May 27, 2020

Conversation

irhum
Copy link
Contributor

@irhum irhum commented May 24, 2020

The current blockdiag implementation in SparseArrays uses type promotion to identify what types the values and index should be for the output SparseArray.

This can be quite slow when the input to blockdiag is many small, sparse arrays (for the sake of example, over 2500+). If all the input sparse arrays have the same value and index types, then it's not necessary to compute them via type promotion and we can directly proceed to creating the new matrix.

This can provide significant speedups; for M generated with

M = [sprand(rand(15:50), rand(15:50), 0.02) for i in 1:2500]

and benchmarking the existing implementation with BenchmarkTools

@benchmark output = blockdiag(M...)

we get

BenchmarkTools.Trial: 
  memory estimate:  192.35 MiB
  allocs estimate:  37956
  --------------
  minimum time:     1.404 s (1.30% GC)
  median time:      1.434 s (1.31% GC)
  mean time:        1.438 s (1.24% GC)
  maximum time:     1.482 s (0.89% GC)
  --------------
  samples:          4
  evals/sample:     1

running the same benchmark, with same M, with the new implementation, we get a speedup of nearly 1000x

@benchmark output = blockdiag(M...)
  memory estimate:  3.00 MiB
  allocs estimate:  5019
  --------------
  minimum time:     1.072 ms (0.00% GC)
  median time:      1.162 ms (0.00% GC)
  mean time:        1.386 ms (11.86% GC)
  maximum time:     5.059 ms (60.41% GC)
  --------------
  samples:          3591
  evals/sample:     1

the current blockdiag implementation is hence split into 3 possible methods

blockdiag(X::AbstractSparseMatrixCSC...) # for the general case
blockdiag(X::AbstractSparseMatrixCSC{Tv, Ti}...) where {Tv, Ti <: Integer} # for increased speed when all input X have the same index and value types
blockdiag(::Type{Tv}, ::Type{Ti}, X::AbstractSparseMatrixCSC...) where {Tv, Ti <: Integer} # internally called by the two above methods, and available to the user when they're dealing with heterogenous inputs, and wish to specifiy the output index and value types manually for the increased speed benefits

@irhum
Copy link
Contributor Author

irhum commented May 24, 2020

added another method to deal with the empty input case, blockdiag(), otherwise it defaults to

blockdiag(X::AbstractSparseMatrixCSC{Tv, Ti}...) where {Tv, Ti <: Integer}

instead of

blockdiag(X::AbstractSparseMatrixCSC...)

resulting in a Tv not defined error, making an explicit definition necessary.

The pull request, I would suggest is important given the case where all inputs have the same types for the value and index would be more common (and would benefit from increased performance) than the heterogenous case

@dkarrasch dkarrasch added the domain:arrays:sparse Sparse arrays label May 24, 2020
Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice contribution, @irhum, and welcome! I have one comment that I'm curious what other people think about.

stdlib/SparseArrays/src/sparsematrix.jl Outdated Show resolved Hide resolved
Co-authored-by: Daniel Karrasch <[email protected]>
@dkarrasch
Copy link
Member

You'll need to rename the function at the call sites as well.

@irhum
Copy link
Contributor Author

irhum commented May 24, 2020

The function call sites have been changed to make the appropriate _blockdiag call.

Since we also need to account for the case where no inputs are given, a clean solution seems to be to pass it as an optional argument to one of the methods, specifically:

blockdiag(X::AbstractSparseMatrixCSC{Tv, Ti}...=spzeros(0,0)) where {Tv, Ti <: Integer}

Since we're explicitly handing it, isempty(X) in

Ti = isempty(X) ? Int : promote_type(map(x->eltype(rowvals(x)), X)...)

is always false, and hence can be replaced with

Ti = promote_type(map(x->eltype(rowvals(x)), X)...)

Opinions?

@irhum irhum requested a review from dkarrasch May 25, 2020 21:34
Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in principle, but there is this subtle issue with empty arguments.

@@ -3318,16 +3318,23 @@ julia> blockdiag(sparse(2I, 3, 3), sparse(4I, 2, 2))
⋅ ⋅ ⋅ ⋅ 4
```
"""
function blockdiag(X::AbstractSparseMatrixCSC{Tv, Ti}...=spzeros(0,0)) where {Tv, Ti <: Integer}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default value makes the function change behavior for empty arguments. Currently, we have

julia> A = blockdiag()
0×0 SparseMatrixCSC{Union{},Int64} with 0 stored entries

julia> B = spzeros(0,0)
0×0 SparseMatrixCSC{Float64,Int64} with 0 stored entries

Changing that behavior may be for the better, but I'm not sure. This definitely requires a broader discussion, I'd say.

Copy link
Contributor Author

@irhum irhum May 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not change the behavior inside this pull request anyway; the goal here is to provide increased speed when dealing with CSC arrays with the same types for the index and values; if needed, a separate issue can be opened whether the default should be changed.

Created a new commit that fixes this (the output is now SparseMatrixCSC{Union{},Int64}) by explicitly spelling out the blockdiag() case, to make it clear what it does today, to avoid any confusions in the future.

@irhum irhum requested a review from dkarrasch May 26, 2020 17:02
@@ -3318,16 +3318,25 @@ julia> blockdiag(sparse(2I, 3, 3), sparse(4I, 2, 2))
⋅ ⋅ ⋅ ⋅ 4
```
"""
blockdiag() = spzeros(Union{}, Int, 0, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to have

Suggested change
blockdiag() = spzeros(Union{}, Int, 0, 0)
blockdiag() = spzeros(promote_type(), Int, 0, 0)

to avoid hardcoding some "non-sense" type and let the ecosystem decide. Pinging @tkf for this nasty "reduction over empty tuples" issue.

Copy link
Contributor Author

@irhum irhum May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, it is more sensible this way. I believe that resolves the issue for now, of speeding up this function while still making sure blockdiag() output remains unchanged. Would appreciate what @tkf has to say about the reduction on empty tuple issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Union{} makes sense (i.e., it's the identity element of promote_type as a monoid). Though promote_type() sounds also fine. It's just a LISPy way of saying that we want the identity of promote_type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tkf. I just had no idea what might be subject to potential future design changes and what is "set in stone". I'll go ahead and merge this as is then.

@dkarrasch dkarrasch added the performance Must go faster label May 27, 2020
@dkarrasch dkarrasch merged commit d0b2be1 into JuliaLang:master May 27, 2020
simeonschaub pushed a commit to simeonschaub/julia that referenced this pull request Aug 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:arrays:sparse Sparse arrays performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants