Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Division of a sparse matrix by row vector causes 0's to become NaN #23857

Closed
jsams opened this issue Sep 24, 2017 · 9 comments
Closed

Division of a sparse matrix by row vector causes 0's to become NaN #23857

jsams opened this issue Sep 24, 2017 · 9 comments
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior sparse Sparse arrays

Comments

@jsams
Copy link

jsams commented Sep 24, 2017

I think the title about says it all. This example creates a sparse CSC matrix with 7 stored entries, but then the division creates 12 stored entries, with the 5 extra being NaN. None of the column means are 0 or near 0, so this should not be happening.

julia> srand(2);
julia> X = sprand(3, 4, 0.6);
julia> Xmeandiv = X  ./ mean(X, 1)
3×4 SparseMatrixCSC{Float64,Int64} with 12 stored entries:
  [1, 1]  =  1.37231
  [2, 1]  =  1.20439
  [3, 1]  =  0.423301
  [1, 2]  =  1.91425
  [2, 2]  =  NaN
  [3, 2]  =  1.08575
  [1, 3]  =  NaN
  [2, 3]  =  3.0
  [3, 3]  =  NaN
  [1, 4]  =  3.0
  [2, 4]  =  NaN
  [3, 4]  =  NaN

division by a column vector doesn't create NaNs, but it does cause explicit 0s to be stored:

julia> X ./ mean(X, 2)
3×4 SparseMatrixCSC{Float64,Int64} with 12 stored entries:
  [1, 1]  =  1.90584
  [2, 1]  =  1.91323
  [3, 1]  =  1.50071
  [1, 2]  =  1.72613
  [2, 2]  =  0.0
  [3, 2]  =  2.49929
  [1, 3]  =  0.0
  [2, 3]  =  2.08677
  [3, 3]  =  0.0
  [1, 4]  =  0.36803
  [2, 4]  =  0.0
  [3, 4]  =  0.0

scalar division works as expected

julia> X ./ 3.3
3×4 SparseMatrixCSC{Float64,Int64} with 7 stored entries:
  [1, 1]  =  0.237995
  [2, 1]  =  0.208872
  [3, 1]  =  0.0734115
  [1, 2]  =  0.215552
  [3, 2]  =  0.12226
  [2, 3]  =  0.227818
  [1, 4]  =  0.0459582

(even if this was correct behavior, does it make sense to allow the result of an operation on a sparse matrix to return a dense matrix if the storage requirements will be lower or close to the same?)

julia> versioninfo()
Julia Version 0.6.0
Commit 9036443 (2017-06-19 13:05 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-7300U CPU @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
@andreasnoack andreasnoack added bug Indicates an unexpected problem or unintended behavior sparse Sparse arrays labels Sep 24, 2017
@KlausC
Copy link
Contributor

KlausC commented Nov 25, 2017

minimal more concise cases to demonstrate the same effect:

julia> sparse([1; 0]) ./ [1]
2-element SparseVector{Float64,Int64} with 2 stored entries:
  [1]  =  1.0
  [2]  =  NaN

julia> sparse([1; 0]) ./ [1; 1]
2-element SparseVector{Float64,Int64} with 2 stored entries:
  [1]  =  1.0
  [2]  =  0.0

In the first case, the result of 0/0 is stored to all structural zero array locations, which is wrong.
In the second case, a stored zero is produced, where it could be structural, which is not optimal.

julia> X = sparse([1;1]);
julia> X[2,1] = 0;

julia> X
2-element SparseVector{Int64,Int64} with 2 stored entries:
  [1]  =  1
  [2]  =  0

julia> X ./ [1]
2-element SparseVector{Float64,Int64} with 2 stored entries:
  [1]  =  1.0
  [2]  =  0.0

This shows, that the wrong behaviour does not happen, it the zero is not structural.

@KlausC
Copy link
Contributor

KlausC commented Nov 25, 2017

Another bug in the same realm, maybe even more severe, as this example shows:

julia> sparse([1;1]) .÷ [1]
ERROR: DivideError: integer division error
Stacktrace:
 [1] _diffshape_broadcast(::typeof(div), ::SparseVector{Int64,Int64}, ::SparseVector{Int64,Int64}) at ./sparse/higherorderfns.jl:126
 [2] broadcast(::typeof(div), ::SparseVector{Int64,Int64}, ::SparseVector{Int64,Int64}) at ./sparse/higherorderfns.jl:119
 [3] broadcast(::Function, ::SparseVector{Int64,Int64}, ::Array{Int64,1}) at ./broadcast.jl:455
 [4] top-level scope

The expression 0 ÷ 0 is evaluated, even if there is no need to divide by zero.
Unfortunately the code in sparse/higherorderfns.jl is too complex for a fast fix.

@KlausC
Copy link
Contributor

KlausC commented Nov 25, 2017

Yet another surprise / bug:

julia> broadcast(+, sparse([1;0]), sparse([1]))
2-element SparseVector{Int64,Int64} with 2 stored entries:
  [1]  =  2
  [2]  =  1

julia> broadcast(+, sparse([1;0]), [1])
ERROR: DimensionMismatch("")
Stacktrace:
 [1] _binarymap(::typeof(+), ::SparseVector{Int64,Int64}, ::Array{Int64,1}, ::Int64) at ./sparse/sparsevector.jl:1320
 [2] broadcast(::typeof(+), ::SparseVector{Int64,Int64}, ::Array{Int64,1}) at ./sparse/sparsevector.jl:1383
 [3] top-level scope

Wonder why it works for / but not for + .

@Sacha0
Copy link
Member

Sacha0 commented Nov 25, 2017

Thanks for the additional examples! Sparse broadcast over combinations including dense objects indeed needs a good bit of attention after 1.0 :). Best!

@KlausC
Copy link
Contributor

KlausC commented Nov 27, 2017

I just created a PR for the original issue: #24806

@mbauman
Copy link
Member

mbauman commented Apr 23, 2018

Fixed by #24806.

@mbauman mbauman closed this as completed Apr 23, 2018
@jsams
Copy link
Author

jsams commented Apr 25, 2018

There were two parts to this bug: The NaNs and the creation of, i believe the term is, 'non-structural zeros' (i.e. zeros stored in the array rather than being implied by the structure of the object). Based on a quick test of the lastest nightly, only the NaNs are fixed. But the NaNs, at least in some instances, have been replaced by these non-structural zeros.

As a user, I would not expect broadcast operations to add structural zeros when unnecessary. It might be reasonable to expect them to remove generated zeros, but I understand that can get iffy with floating point comparisons to zero. That said, it looks like non-structural zeros are being pruned in at least some circumstances. So that's nice, but the system isn't perfect. Here's an example set of tests:

using SparseArrays
using Test

@test size(sparse([1.0; 0.0]).nzval, 1) == 1
@test_broken size((sparse([1.0; 0.0]) ./ [1.0]).nzval, 1) == 1 # is 2, should be 1
@test size((sparse([1.0; 0.0]) .* [1.0]).nzval, 1) == 1 # expected
@test size((sparse([1.0; 0.0]) .+ [0.0]).nzval, 1) == 1 # expected
@test size((sparse([2.0; 1.0]) .- [1.0]).nzval, 1) == 1 # nice, pruning!

@mbauman
Copy link
Member

mbauman commented Apr 25, 2018

I believe that is fairly well covered by #26561.

@jsams
Copy link
Author

jsams commented Apr 25, 2018

indeed it is! Sorry, I missed that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests

5 participants