Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast conversion of wrapped types to SparseMatrixCSC #30552

Merged
merged 38 commits into from
Mar 4, 2019
Merged

Fast conversion of wrapped types to SparseMatrixCSC #30552

merged 38 commits into from
Mar 4, 2019

Conversation

KlausC
Copy link
Contributor

@KlausC KlausC commented Jan 1, 2019

This PR addresses the issue of slow conversions of wrapped sparse matrices to sparse.
For example see also issue 29353 by @KristofferC.
The case SparseMatrixCSC(Symmetric(A)) for sparse A is generalized to wrappers Symmetric, Hermitian, [Unit](Upper|Lower)Triangular, Adjoint, Transpose, SubArray and arbitrarily nested combinations of those.
The operation time is O(nnz(A) * O(depth(A)), where depth is the wrapping depth of A. All method specializations are type-stable.
Additional functions of the type of A are defined, which are evaluated at compile time.
(SparseWrappers: depth, iswrsparse).
Another useful function SparseWrappers.unwrap is defined to convert a wrapped matrix to a SparseMatrixCSC or a Matrix depending on the sparsity status of A.

Following benchmark results were observed:

sparse conversion input type minimum time before minimum time after
Symmetric(A) 3.175 s (0.01%) 9.942 ms (0.00%)
Adjoint(A) 20.218 ms (1.59%) 19.672 ms (0.00%)
UpperTriangular(A) 3.010 s (0.54%) 2.768 ms (0.00%)
UnitUpperTriangular(A) 825.495 ms (0.04%) 1.955 ms (0.00%)
Adjoint(UpperTriangular(A)) 3.265 s (0.00%) 8.716 ms (0.00%)
Adjoint(UnitUpperTriangular(A)) 2.448 s (0.00%) 8.785 ms (0.00%)
Symmetric(Adjoint(UnitUpperTriangular(A))) 537.293 ms (0.00%) 9.461 ms (0.00%)
Symmetric(Adjoint(UnitUpperTriangular(A'))) 563.827 ms (0.00%) 30.850 ms (1.48%)

The complete benchmark listing is attached here for completeness:

julia> n = 10^4
10000

julia> Random.seed!(0); A = sprandn(n, n, 10^6/n^2); nnz(A)
1000598

### Current behavior:

julia> @benchmark sparse(Symmetric(A))
BenchmarkTools.Trial:
  memory estimate:  40.03 MiB
  allocs estimate:  36
  --------------
  minimum time:     3.175 s (0.01% GC)
  median time:      3.196 s (1.42% GC)
  mean time:        3.196 s (1.42% GC)
  maximum time:     3.218 s (2.82% GC)
  --------------
  samples:          2
  evals/sample:     1

julia> @benchmark sparse(Adjoint(A))
BenchmarkTools.Trial:
  memory estimate:  15.34 MiB
  allocs estimate:  9
  --------------
  minimum time:     20.218 ms (1.59% GC)
  median time:      20.763 ms (1.55% GC)
  mean time:        21.917 ms (3.93% GC)
  maximum time:     97.635 ms (74.28% GC)
  --------------
  samples:          228
  evals/sample:     1

julia> @benchmark sparse(UpperTriangular(A))
BenchmarkTools.Trial:
  memory estimate:  20.59 MiB
  allocs estimate:  35
  --------------
  minimum time:     3.010 s (0.54% GC)
  median time:      3.063 s (0.27% GC)
  mean time:        3.063 s (0.27% GC)
  maximum time:     3.115 s (0.00% GC)
  --------------
  samples:          2
  evals/sample:     1

julia> @benchmark sparse(UnitUpperTriangular(A))
BenchmarkTools.Trial:
  memory estimate:  20.82 MiB
  allocs estimate:  35
  --------------
  minimum time:     825.495 ms (0.04% GC)
  median time:      831.074 ms (0.04% GC)
  mean time:        841.265 ms (1.44% GC)
  maximum time:     882.700 ms (6.10% GC)
  --------------
  samples:          6
  evals/sample:     1

julia> @benchmark sparse(Adjoint(UpperTriangular(A)))
BenchmarkTools.Trial:
  memory estimate:  20.59 MiB
  allocs estimate:  37
  --------------
  minimum time:     3.265 s (0.00% GC)
  median time:      3.283 s (0.01% GC)
  mean time:        3.283 s (0.01% GC)
  maximum time:     3.302 s (0.01% GC)
  --------------
  samples:          2
  evals/sample:     1

julia> @benchmark sparse(Adjoint(UnitUpperTriangular(A)))
BenchmarkTools.Trial:
  memory estimate:  20.82 MiB
  allocs estimate:  37
  --------------
  minimum time:     2.448 s (0.00% GC)
  median time:      2.451 s (0.00% GC)
  mean time:        2.459 s (0.22% GC)
  maximum time:     2.477 s (0.67% GC)
  --------------
  samples:          3
  evals/sample:     1

julia> @benchmark sparse(Symmetric(Adjoint(UnitUpperTriangular(A))))
BenchmarkTools.Trial:
  memory estimate:  904.06 KiB
  allocs estimate:  33
  --------------
  minimum time:     537.293 ms (0.00% GC)
  median time:      538.931 ms (0.00% GC)
  mean time:        538.824 ms (0.00% GC)
  maximum time:     539.915 ms (0.00% GC)
  --------------
  samples:          10
  evals/sample:     1

julia> @benchmark sparse(Symmetric(Adjoint(UnitUpperTriangular(A'))))
BenchmarkTools.Trial:
  memory estimate:  904.09 KiB
  allocs estimate:  35
  --------------
  minimum time:     563.827 ms (0.00% GC)
  median time:      577.790 ms (0.00% GC)
  mean time:        577.844 ms (0.00% GC)
  maximum time:     591.112 ms (0.00% GC)
  --------------
  samples:          9
  evals/sample:     1

julia>

### Proposed implementation

julia> @benchmark sparse(Symmetric(A))
BenchmarkTools.Trial:
  memory estimate:  15.32 MiB
  allocs estimate:  9
  --------------
  minimum time:     9.942 ms (0.00% GC)
  median time:      10.527 ms (3.85% GC)
  mean time:        10.693 ms (3.89% GC)
  maximum time:     63.505 ms (83.87% GC)
  --------------
  samples:          467
  evals/sample:     1

julia> @benchmark sparse(Adjoint(A))
BenchmarkTools.Trial:
  memory estimate:  15.34 MiB
  allocs estimate:  9
  --------------
  minimum time:     19.672 ms (0.00% GC)
  median time:      20.470 ms (1.97% GC)
  mean time:        20.800 ms (2.64% GC)
  maximum time:     73.974 ms (72.81% GC)
  --------------
  samples:          241
  evals/sample:     1

julia> @benchmark sparse(UpperTriangular(A))
BenchmarkTools.Trial:
  memory estimate:  7.70 MiB
  allocs estimate:  9
  --------------
  minimum time:     2.768 ms (0.00% GC)
  median time:      2.928 ms (0.00% GC)
  mean time:        3.089 ms (5.97% GC)
  maximum time:     58.220 ms (94.69% GC)
  --------------
  samples:          1611
  evals/sample:     1

julia> @benchmark sparse(UnitUpperTriangular(A))
BenchmarkTools.Trial:
  memory estimate:  15.34 MiB
  allocs estimate:  9
  --------------
  minimum time:     1.955 ms (0.00% GC)
  median time:      2.380 ms (17.30% GC)
  mean time:        2.351 ms (13.29% GC)
  maximum time:     56.392 ms (94.43% GC)
  --------------
  samples:          2115
  evals/sample:     1

julia> @benchmark sparse(Adjoint(UpperTriangular(A)))
BenchmarkTools.Trial:
  memory estimate:  7.70 MiB
  allocs estimate:  10
  --------------
  minimum time:     8.716 ms (0.00% GC)
  median time:      9.062 ms (0.00% GC)
  mean time:        9.242 ms (2.86% GC)
  maximum time:     63.112 ms (85.87% GC)
  --------------
  samples:          540
  evals/sample:     1

julia> @benchmark sparse(Adjoint(UnitUpperTriangular(A)))
BenchmarkTools.Trial:
  memory estimate:  7.85 MiB
  allocs estimate:  10
  --------------
  minimum time:     8.785 ms (0.00% GC)
  median time:      9.135 ms (0.00% GC)
  mean time:        9.339 ms (2.85% GC)
  maximum time:     62.943 ms (85.79% GC)
  --------------
  samples:          535
  evals/sample:     1

julia> @benchmark sparse(Symmetric(Adjoint(UnitUpperTriangular(A))))
BenchmarkTools.Trial:
  memory estimate:  8.08 MiB
  allocs estimate:  20
  --------------
  minimum time:     9.461 ms (0.00% GC)
  median time:      9.890 ms (0.00% GC)
  mean time:        10.092 ms (2.78% GC)
  maximum time:     64.244 ms (85.00% GC)
  --------------
  samples:          495
  evals/sample:     1

julia> @benchmark sparse(Symmetric(Adjoint(UnitUpperTriangular(A'))))
BenchmarkTools.Trial:
  memory estimate:  38.79 MiB
  allocs estimate:  39
  --------------
  minimum time:     30.850 ms (1.48% GC)
  median time:      32.190 ms (2.68% GC)
  mean time:        33.659 ms (6.32% GC)
  maximum time:     117.152 ms (72.38% GC)
  --------------
  samples:          149
  evals/sample:     1

@andreasnoack andreasnoack self-requested a review January 2, 2019 12:05
@ViralBShah ViralBShah added the domain:arrays:sparse Sparse arrays label Jan 4, 2019
@KlausC
Copy link
Contributor Author

KlausC commented Jan 12, 2019

I think, this PR is important, because it allows to avoid the "abstractarray fallback" for wrapped sparse matrices by transforming them to unwrapped sparse matrices.

@KlausC
Copy link
Contributor Author

KlausC commented Feb 10, 2019

Bump ??!

@ViralBShah
Copy link
Member

Just want to say that I am not against this PR, but I have been hesitating merging it because it feels a bit brute force. Maybe that shouldn't be a reason to merge it now and have something better in the future.

@KlausC
Copy link
Contributor Author

KlausC commented Feb 22, 2019

I am not against this PR, but I have been hesitating merging it because it feels a bit brute force.

I am not sure I understand. What do you mean by "brute force"? Is that targeted to the coding style? In my eyes, it feels quite good. Maybe it is not appropriate to argument against feelings, but let me give some explanations, why my feeling is positive:

  • this PR implements an essential improvement of usability of the SparseArrays library. Let me mention, that the given benchmarks are using a medium sized matrix, in order to demonstrate the performance factor. For really big sparse matrices, the pre-existing code just doesn't come to an end in a reasonable time.

  • The code is not invasive in the sense that it does not remove any line of the existing code. In contrast it supports the existing wrapper types with the new functionality of avoiding the "abstract array fallback".

  • There is only a single block of code inserted into one file, which can be easily removed, once a better solution has been developed.

  • The 3 specialized lengthy implementations of _sparsem are the "working-horses" of the PR. They have each a size of ~ 50 lines and are made in the spirit of ftranspose! / hallfperm!, which goes back the classical Gustavson implementation found in the preexisting part of sparsematrix.jl, with similar size. I don't see much opportunity to improve this code with similar performance.

  • why is there no technical code review?

@ViralBShah
Copy link
Member

Sorry, my comment about brute force is not related to your PR or coding style, but on the need to enumerate all these cases as a shortcoming of Julia itself.

I will try to review this and get this merged.

@KlausC
Copy link
Contributor Author

KlausC commented Feb 22, 2019

Thanks, sorry for my misunderstanding.

@yurivish
Copy link
Contributor

yurivish commented Feb 25, 2019

I just ran into a slowdown that might be related and wanted to check if it is — I had an Adjoint wrapper around a sparse matrix, and indexing it with a[:, :] takes so long that I had to abort the computation.

Is that kind of indexing performance addressed by this PR?

@KlausC
Copy link
Contributor Author

KlausC commented Feb 26, 2019

I had an Adjoint wrapper around a sparse matrix, and indexing it with a[:, :] takes so long that I had to abort the computation

This is a related problem, which is not covered by this PR. Instead of a[:,:], you could use sparse(a), which converts the Adjoint{,SparseMatrixCSC}to a SparseMatrixCSC, and which has good performance.

@ViralBShah
Copy link
Member

Can we put these in a separate file, say, sparseconvert.jl, as a reminder that there should be a better way to do this in the future and avoid the combinatorial explosion of things to define?

@yurivish
Copy link
Contributor

yurivish commented Feb 26, 2019

Yes, that is a good idea and is more or less what I did. My actual use case is indexing with arr[inds, :] or arr[:, inds], and I created a small method that uses sparse when necessary to "unwrap" the matrix before indexing:

unwrap_sparse_wrapper(x::Adjoint{<:Any, <:AbstractSparseMatrix}) = sparse(x)
unwrap_sparse_wrapper(x::Transpose{<:Any, <:AbstractSparseMatrix}) = sparse(x)
unwrap_sparse_wrapper(x) = x

Edit: Now I think about it, I should probably use knowledge of the array being a transpose to optimize the row lookup case.

@KlausC
Copy link
Contributor Author

KlausC commented Feb 27, 2019

Can we put these in a separate file, say, sparseconvert.jl

done.

@ViralBShah ViralBShah merged commit 7a75d6b into JuliaLang:master Mar 4, 2019
@ViralBShah
Copy link
Member

ViralBShah commented Mar 4, 2019

Can we avoid exporting unwrap and iswrsparse? I noticed after merging. It would be nice if there were a new PR - or if preferable, I can revert this one.

@KlausC KlausC deleted the krc/symmtosparse branch March 6, 2019 23:07
fredrikekre added a commit that referenced this pull request Jun 8, 2019
This method was introduced in #30552, but was unrelated to the rest of
the changes, and no-one reviewed or though about the implications.
fredrikekre added a commit that referenced this pull request Jun 9, 2019
This method was introduced in #30552, but was unrelated to the rest of
the changes, and no-one reviewed or though about the implications.
KristofferC pushed a commit that referenced this pull request Jun 9, 2019
…32266)

This method was introduced in #30552, but was unrelated to the rest of
the changes, and no-one reviewed or though about the implications.
KristofferC pushed a commit that referenced this pull request Jun 9, 2019
…32266)

This method was introduced in #30552, but was unrelated to the rest of
the changes, and no-one reviewed or though about the implications.

(cherry picked from commit 5d02c59)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants