Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes #8416 (and duplicated issue #9664) #10106

Merged
merged 1 commit into from
Apr 16, 2015

Conversation

amartgon
Copy link

@amartgon amartgon commented Feb 6, 2015

I believe this fixes the problem with Vector .* Sparse and Sparse .* Vector (#8416 and #9664)

@mbauman mbauman added the sparse Sparse arrays label Feb 6, 2015
@tkelman
Copy link
Contributor

tkelman commented Feb 6, 2015

This looks like it could substantially slow down elementwise multiplication of sparse matrices with the same sizes, since broadcast! is not at all optimized for sparse arrays.

edit:

julia> A = sprand(1000,1000,0.3);
julia> B = sprand(1000,1000,0.3);
julia> @time A.*B;
elapsed time: 0.004140143 seconds (9 MB allocated)
julia> @time A.*B;
elapsed time: 0.007408249 seconds (9 MB allocated, 27.46% gc time in 1 pauses with 0 full sweep)
julia> @time A.*B;
elapsed time: 0.004388563 seconds (9 MB allocated)
julia> @time broadcast!(*, spzeros(Float64, Int, 1000, 1000), A, B);
elapsed time: 0.197608866 seconds (4 MB allocated)
julia> @time broadcast!(*, spzeros(Float64, Int, 1000, 1000), A, B);
elapsed time: 0.193278318 seconds (4 MB allocated)
julia> @time broadcast!(*, spzeros(Float64, Int, 1000, 1000), A, B);
elapsed time: 0.193749637 seconds (4 MB allocated)

@tkelman
Copy link
Contributor

tkelman commented Feb 6, 2015

A better solution to this is to stop using 1-column matrices as sparse vectors. There is a PR open for a sparse vector type, might be good to just get that into shape.

@quinnj
Copy link
Member

quinnj commented Feb 6, 2015

Ah yes, I forgot I had got the sparse vector work started. I'll try to pick
that back up to push it along.

On Fri, Feb 6, 2015 at 3:33 PM, Tony Kelman [email protected]
wrote:

A better solution to this is to stop using 1-column matrices as sparse
vectors. There is a PR open for a sparse vector type, might be good to just
get that into shape.


Reply to this email directly or view it on GitHub
#10106 (comment).

@simonster
Copy link
Member

Shouldn't we make broadcasting work (efficiently) for SparseMatrices whether or not we have sparse vectors?

@tkelman
Copy link
Contributor

tkelman commented Feb 6, 2015

Yeah, that'd be great. Any proposals for how to go about doing that? This doesn't apply just to sparse, but all of the structured matrix types - and not just to broadcast, but all of the reductions and other "generic" operations throughout base. There isn't currently a great way of conveying or utilizing structural information in an extensible, truly generic way that works for anything that isn't dense. But now I'm repeating myself from #8001. In the meantime, one method at a time, one type at a time.

@simonster
Copy link
Member

For the case where one input is a SparseMatrix and one is a dense column or row vector, we can just call scale, which already exists for SparseMatrixCSC.

For the case where both the inputs are SparseMatrices, one of which is a single row/column, and the output is a SparseMatrix, I'm not sure there is a sufficiently efficient general-purpose way to handle things, and we should probably just implement the methods.

As far as a general purpose way to iterate for reductions etc., we could have something like an iterator version of findnz which falls back to @timholy's Cartesian product iterator for the dense case, although I believe there's still some overhead to the Cartesian product iterator and you're right that we may need to think harder about how to handle row-major arrays.

@ViralBShah
Copy link
Member

I think generic is the harder part, but we should at least make broadcast make with the current sparse data structures.

@amartgon
Copy link
Author

amartgon commented Feb 9, 2015

Things are never as easy as they look at first sight.. Thanks for the performance tests and all the comments: really useful for me to better understand the problem.

I would like to work on making broadcast efficient with SparseMatrixCSC. This first version would work only with 2D, so Cartesian indexing would not be used. Most of the code for efficient operations is already there. It would just be necessary to call it from a broadcast! method and ensure that sparse matrix operations use it consistently. I also think this is not incompatible with introducing new sparse types in the future.

Does it make sense?

@tkelman
Copy link
Contributor

tkelman commented Feb 10, 2015

Does it make sense?

Yes, if you're interested in taking a crack at it please go ahead.

For sparse broadcast, we may need a new classification to more clearly differentiate operators that produce a nonzero when either of their inputs are nonzero, vs operators that return a nonzero only when both of their inputs are nonzero.

@JeffBezanson
Copy link
Sponsor Member

What's the status of this?

@tkelman
Copy link
Contributor

tkelman commented Mar 8, 2015

Sparse broadcast needs design work. Adding proper SparseVectors has an open PR, but with conflicts and in need of a rebase.

@amartgon
Copy link
Author

amartgon commented Mar 9, 2015

Regarding only broadcast of SparseMatrixCSC: I have been working on it and advancing, though not as fast as I'd like. I think I can have a proposal implemented by the end of this week.

It would include a new broadcast method for the case of two sparse operands (even though the general broadcast methods take n operands). And another method, which could be called broadcast_zpreserving, for binary operations that have zero result when any of the two operands is 0 (such as .*). This broadcast_zpreserving would take one sparse operand and other that would be any type of array or constant.

This would reflect the distinction between types of operators mentioned by tkelman:

For sparse broadcast, we may need a new classification to more clearly differentiate operators that produce a nonzero when either of their inputs are nonzero, vs operators that return a nonzero only when both of their inputs are nonzero.

@tkelman
Copy link
Contributor

tkelman commented Mar 11, 2015

Sounds like an interesting approach, I look forward to seeing what you can come up with @amartgon.

@amartgon
Copy link
Author

Here is the proposal. I think it can be a starting point to be generalized in the future for other sparse structures. Any comments are very welcome.

A few simple performance tests are included bellow. Methods with the suffix "_old" are the old implementation of the operations, which I renamed this way before deleting them. First calls to methods so that they get compiled are omitted.

My conclusion: most operations have a small overhead, but some cases such as sparse .* dense are much faster. Other cases such as sparse(n x m) .* sparse(n x 1) did not work at all before.

julia> A = sprand(10000,10000,0.3); B = sprand(10000,10000,0.3); BF = full(B);

julia>  @time A.+B; #new
elapsed time: 0.976550561 seconds (915 MB allocated, 0.64% gc time in 1 pauses with 1 full sweep)

julia>  @time A.+B; #new
elapsed time: 1.031963776 seconds (915 MB allocated, 0.61% gc time in 1 pauses with 1 full sweep)

julia>  @time A.+B; #new
elapsed time: 1.058726632 seconds (915 MB allocated, 9.80% gc time in 1 pauses with 0 full sweep)

julia>  @time A+B; #old
elapsed time: 1.018026417 seconds (915 MB allocated, 12.78% gc time in 2 pauses with 1 full sweep)

julia>  @time A+B; #old
elapsed time: 0.990413792 seconds (915 MB allocated, 10.85% gc time in 2 pauses with 1 full sweep)

julia>  @time A+B; #old
elapsed time: 0.961619903 seconds (915 MB allocated, 11.15% gc time in 2 pauses with 1 full sweep)

julia>

julia>

djulia> @time A.*B; #new
elapsed time: 0.689984741 seconds (915 MB allocated, 0.86% gc time in 1 pauses with 1 full sweep)

julia> @time A.*B; #new
elapsed time: 0.688215981 seconds (915 MB allocated, 0.86% gc time in 1 pauses with 1 full sweep)

julia> @time A.*B; #new
elapsed time: 0.699050298 seconds (915 MB allocated, 1.01% gc time in 1 pauses with 1 full sweep)

julia> @time Base.SparseMatrix.times_old(A,B);  #old
elapsed time: 0.575538209 seconds (915 MB allocated, 7.48% gc time in 2 pauses with 1 full sweep)

julia> @time Base.SparseMatrix.times_old(A,B);  #old
elapsed time: 0.574641348 seconds (915 MB allocated, 7.61% gc time in 2 pauses with 1 full sweep)

julia> @time Base.SparseMatrix.times_old(A,B);  #old
elapsed time: 0.584391185 seconds (915 MB allocated, 7.42% gc time in 2 pauses with 1 full sweep)

julia>

julia>

julia> @time A.*BF; #new
elapsed time: 0.472082517 seconds (457 MB allocated, 1.39% gc time in 1 pauses with 1 full sweep)

julia> @time A.*BF; #new
elapsed time: 0.469774405 seconds (457 MB allocated, 1.42% gc time in 1 pauses with 1 full sweep)

julia> @time A.*BF; #new
elapsed time: 0.480360273 seconds (457 MB allocated, 1.16% gc time in 1 pauses with 1 full sweep)

julia> @time Base.SparseMatrix.times_old(A,BF);  #old
elapsed time: 2.437033787 seconds (1602 MB allocated, 4.56% gc time in 5 pauses with 4 full sweep)

julia> @time Base.SparseMatrix.times_old(A,BF);  #old
elapsed time: 2.500815834 seconds (1602 MB allocated, 5.77% gc time in 4 pauses with 3 full sweep)

julia> @time Base.SparseMatrix.times_old(A,BF);  #old
elapsed time: 2.464224123 seconds (1602 MB allocated, 5.73% gc time in 4 pauses with 3 full sweep)

@@ -98,6 +99,207 @@ function gen_broadcast_body_iter(nd::Int, narrays::Int, f::Function)
end
end

## Broadcasting cores specialized for returning a SparseMatrixCSC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More common terminology here would be "kernels" rather than "cores"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I'll correct it. Thanks!

@tkelman
Copy link
Contributor

tkelman commented Mar 15, 2015

Overall this looks pretty good, the sparse .* dense improvement is especially nice. Have you profiled to see where the ~20% difference in sparse .* sparse is coming from? Would be ideal to close that gap, or failing that, leave the existing implementation in place for now. I like the improved generality here, but performance regressions are annoying if we can avoid them.

@amartgon
Copy link
Author

About the ~20% difference in sparse .* sparse:

There was also ~10% difference in sparse .+ sparse, although in the particular runs I posted it was not clear.
After further testing (and taking into account that the new algorithm is basically the same as the old), It turns out that the differences come from storing the array fields in local variables:

colptrA = A.colptr; rowvalA = A.rowval; nzvalA = A.nzval
colptrB = B.colptr; rowvalB = B.rowval; nzvalB = B.nzval

I thought it was just a matter of style and omitted it when adapting the code. After changing it, the old and the new version seem to have undistinguishable running times. I'll submit a corrected version.

@tkelman
Copy link
Contributor

tkelman commented Mar 17, 2015

Yeah, unfortunately manually hoisting field access into temporary variables does make a difference right now. Eventually Julia's compiler might get smart enough (edit: or aliasing assumptions in various types get adjusted) to do that kind of thing automatically, but for now it's a little bit of manual code ugliness in exchange for better performance. I should've mentioned this in some code comments, at least you were able to identify it yourself.

Also have a look at #10536 which is somewhat related (sparse reductions rather than broadcast), be aware that the bootstrapping order changes here and there might result in some conflicts that would need to be rebased depending which gets merged first.

@simonster
Copy link
Member

Linking #9755 for the hoisting issue.

As far as bootstrapping order goes, if we are eventually planning to move sparse matrices to a separate package (as advocated by @tknopp), we should probably try to put all of the sparse matrix-related code in the SparseMatrix module and load it at the end.

@ViralBShah
Copy link
Member

I don't think it is realistic to move sparse support out of base. What is likely is to have new sparse data structure implementations in packages and make it easy to do so.

@tkelman
Copy link
Contributor

tkelman commented Mar 18, 2015

Sparse matrices are one of those things like bigints/bigfloats, FFT's, and hell even blas and lapack, that shouldn't really be required runtime dependencies for every piece of code written in the language - if you're not using the functionality then these aren't absolutely necessary for the rest of the language to function. This is 1.0-type stuff, but eventually there should be ways of selectively disabling big chunks of what's currently in base, for embedding and static compilation use cases. The shorter-term (still probably not until 0.5 or later I'm guessing) default situation is we'll probably want to take SuiteSparse out into a package for licensing reasons, but base will need to provide a scaffolding for different packages to be able to plug in various different sparse formats and solver libraries.

@amartgon
Copy link
Author

This last commit includes the changes we've been commenting on, and also a reorganization of the new code (moving it from broadcast.jl into sparsematrix.jl) so that no modification in the bootstrap process (sysimg.jl) is needed. If there are no new comments or detected problems, this could be the final version.


## element-wise comparison operators returning SparseMatrixCSC ##
.<{Tv1,Ti1,Tv2,Ti2}(A_1::SparseMatrixCSC{Tv1,Ti1}, A_2::SparseMatrixCSC{Tv2,Ti2}) = broadcast!(<, spzeros( Bool, promote_type(Ti1, Ti2), broadcast_shape(A_1, A_2)...), A_1, A_2)
.!={Tv1,Ti1,Tv2,Ti2}(A_1::SparseMatrixCSC{Tv1,Ti1}, A_2::SparseMatrixCSC{Tv2,Ti2}) = broadcast!(!=, spzeros( Bool, promote_type(Ti1, Ti2), broadcast_shape(A_1, A_2)...), A_1, A_2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!= returns true on entries that are zero in both inputs, so this generally won't return sparse nevermind

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand... wouldn't that be .==?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, too early in the morning, you're right

@tkelman
Copy link
Contributor

tkelman commented Mar 20, 2015

Looks pretty good, thanks for rebasing. It will fail on appveyor due to a hasty broken-on-windows merge yesterday, sorry.

I don't think broadcast_zpreserving should be exported.

@amartgon
Copy link
Author

Here is a version where broadcast_zpreserving is not exported.

@tkelman
Copy link
Contributor

tkelman commented Mar 23, 2015

I like it. I think it would be worth squashing these commits.

@ViralBShah @simonster any feedback?

@amartgon
Copy link
Author

Any news about this? (Just a friendly reminder).

@ViralBShah
Copy link
Member

Thanks for the heads up. I really would prefer to see all the broadcasting code go into sparsematrix.jl. Is there a reason to still have some code in broadcast.jl?

Also, can you squash these commits too and rebase to the latest master. I do want to get it merged. Thanks for persisting.

@ViralBShah
Copy link
Member

@tanmaykm Could you please review this PR too?

@amartgon
Copy link
Author

Hi!
Code moved into sparsematrix.jl, commits squashed and rebased to latest master...

@ViralBShah ViralBShah merged commit df141bc into JuliaLang:master Apr 16, 2015
@ViralBShah
Copy link
Member

@amartgon Thanks for this work and for being patient.

@amartgon
Copy link
Author

Great! Thanks to all who provided helpful comments!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants