Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1/x not the same as inv(x) for matrix in julia 0.2.0 #5807

Closed
freefrancisco opened this issue Feb 13, 2014 · 83 comments
Closed

1/x not the same as inv(x) for matrix in julia 0.2.0 #5807

freefrancisco opened this issue Feb 13, 2014 · 83 comments
Labels
domain:arrays [a, r, r, a, y, s] domain:linear algebra Linear algebra kind:breaking This change will break code needs decision A decision on this change is needed

Comments

@freefrancisco
Copy link

I was trying to figure out how to invert a matrix in a 0.2.0 installation of julia, and I found out that 1/x gives me a different inverse than inv(x).
Here is my output

julia> x = rand(3, 3)
3x3 Array{Float64,2}:
 0.698453  0.500252  0.822943
 0.483441  0.078515  0.889188
 0.275469  0.341758  0.195685

julia> 1/x
3x3 Array{Float64,2}:
 1.43174   1.99899  1.21515
 2.06851  12.7364   1.12462
 3.63017   2.92605  5.11026

julia> inv(x)
Warning: Possible conflict in library symbol dtrtri_
Warning: Possible conflict in library symbol dgetri_
Warning: Possible conflict in library symbol dgetrf_
3x3 Array{Float64,2}:
  35.4312  -22.5164  -46.69
 -18.4623   11.0545   27.4109
 -17.6333   12.3904   22.9644

julia> versioninfo()
Julia Version 0.2.0
Commit 05c6461 (2013-11-16 23:44 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin12.5.0)
  WORD_SIZE: 64
  BLAS: libgfortblas
  LAPACK: liblapack
  LIBM: libopenlibm

@andreasnoack
Copy link
Member

Because 1 is a number 1/x gives the reciprocal of each element of the matrix whereas inv is the matrix inverse.

It is better to ask a question like this to the julia-users list.

@pao
Copy link
Member

pao commented Feb 13, 2014

@JeffBezanson
Copy link
Sponsor Member

I wonder if that behavior is still a good idea. After all a^-1 is also matrix inverse.

@JeffBezanson JeffBezanson reopened this Feb 13, 2014
@jiahao
Copy link
Member

jiahao commented Feb 13, 2014

1/A computing the elementwise reciprocal of A is inconsistent with A/B computing B\A. Why not make 1./A compute the elementwise reciprocal instead?

@freefrancisco
Copy link
Author

Also there is at least one julia tutorial that expects 1/A == A^-1
http:https://www.statalgo.com/2012/04/15/statistics-with-julia-linear-algebra-with-lapack/

@jiahao
Copy link
Member

jiahao commented Feb 13, 2014

It feels like 1.0 ./A should be the matrix reciprocal, eye()/A should be the matrix inverse and 1.0/A should just be an error.

@jiahao
Copy link
Member

jiahao commented Feb 13, 2014

That tutorial is quite outdated.

@andreasnoack
Copy link
Member

Agree. I was too fast here. There is also the consequence

julia> [1 1]\1
1x2 Array{Float64,2}:
 1.0  1.0

julia> [1 1]\[1]
2-element Array{Float64,1}:
 0.5
 0.5

Most linear algebra function accept Number interpreted as a Vector of length one or a 1x1 Matrix so I guess the right thing for \(Matrix,Number) is to check if the dimensions are correct.

@StefanKarpinski
Copy link
Sponsor Member

If we were to go down the road of making 1/A compute a matrix inverse, then we'd also want to change the behavior of a lot of other operations. For example, A + 1 should be equivalent to A + eye(size(A)) while A .+ 1 would be how you write the broadcasting sense. I've strongly considered this kind of change before, but it would be pretty disruptive. I think the only path to get there would be to deprecate the non-broadcasting forms for a fairly long time and then reintroduce the new "algebraically correct" behavior.

@JeffBezanson
Copy link
Sponsor Member

I would seriously consider doing that. The case of + is a bit odd since I don't see that very often, but / and \ are used with matrices routinely and the 1/A change seems like the right thing.

@jiahao
Copy link
Member

jiahao commented Feb 13, 2014

A + 1 (or in more conventional notation, A + λ*I) is actually a very important use case: A + λ*I is used in level shifts for all sorts of matrix operations, particularly ill-conditioned matrix eigenvalue problems; (A + λI)⁻¹ is a resolvent kernel for inverting linear operators in the spectral domain; sqrt(A² + λ²I)⁻¹ and the like show up in regularization, to name just three use cases. Currently there isn't a more compact way to write A + λ*I other than A + λ*eye(A), and this notation is wasteful in that it initializes a second matrix only to do little other than to scale and add it to something else. So there are nice algebraic reasons why A + 1 should promote to A + eye(A).

I'm not convinced, however, that this has to be conflated with 1/A computing the matrix inverse vs. the elementwise reciprocal.

@jiahao
Copy link
Member

jiahao commented Feb 13, 2014

A few thoughts on the specific issue of matrix inverse notation though:

  1. The usual admonition is that people should not compute the matrix inverse A⁻¹ explicitly. There's really no need to make it easier to compute matrix inverses.
  2. However, we do use A/B as a synonym for B/A (or at least, try to.). So if change 1/A to be the matrix inverse, then A/1 ought to be consistent with that as well.
  3. The real question is whether an expression containing both a matrix and 1 should have the 1 promote to eye(N) (the proposed behavior) or ones(N,N) with elementwise operations (essentially the current behavior).

@JeffBezanson
Copy link
Sponsor Member

I don't think scalar*matrix could do anything but scale the matrix, no matter how we interpret all the above arguments.

@jiahao
Copy link
Member

jiahao commented Feb 13, 2014

Hm, maybe I wasn't being clear. My point is that there are plenty of contexts in which A + λ for a matrix A and scalar λ is a powerful synonym for A + λ*eye(A).

@jiahao
Copy link
Member

jiahao commented Feb 13, 2014

as opposed to the current behavior of A + λ, which behaves as a synonym for A + λ*ones(A) (and in other such expressions involving matrices and scalars).

@StefanKarpinski
Copy link
Sponsor Member

I would be very much in favor of that change. I can't imagine a single mathematician who would not prefer for A + λ to be an efficient and expressive way to modify the diagonal. The reason that and the 1/A thing should be coupled is that they are both about how we treat mathematical operations on a scalar and a matrix – should it broadcast or should it do the algebraically correct thing? What we're doing now is mathematically sketchy but at least it's consistent. If we changed one and not the other then we'd be neither consistent nor correct. I think that changing both is the best thing – then we'll be both consistent and correct.

@jiahao
Copy link
Member

jiahao commented Feb 14, 2014

This proposal, by the way, is very much as its heart about whether a Matrix is a linear algebraic object or a two-dimensional container (c.f. #987). The broadcasting behavior is consistent across all n-dimensional arrays, whereas this proposal would special-case behavior for n=2.

@mbauman
Copy link
Sponsor Member

mbauman commented Feb 14, 2014

I like the idea of making scalars behave like they're multiplied by I of the "correct size" when interacting with matrices through non-broadcasting operators. The fact that there's already operators designated to do element-wise broadcasting makes it a bit easier, too. Addition is well-defended above, and multiplication is free. Division is tricky, though: what's the correct size for I when dividing by non-square matrices?

If A is square, then eye(size(A)...)/A * A = I, no? My linear algebra is rusty, but isn't eye(size(A,2),size(A,1))/A the choice to make in general? I think that should behave like pinv(A).

I know I have tons of A + λ * eye(size(A)) littered through my Matlab code for state estimation and identification. I think this could be a great change.

@kmsquire
Copy link
Member

As an aside, the size(A) isn't necessary for ones or eye:

julia> ones(A) == ones(size(A))
true

julia> eye(A) == eye(size(A))
ERROR: no method eye((Int64,Int64))

julia> eye(A) == eye(size(A)...)
true

@jiahao
Copy link
Member

jiahao commented Feb 14, 2014

@mbauman Tempting as it is to define 1/A as pinv(A) for rectangular matrices, pseudoinverses are left inverses but not right inverses (whereas a true inverse is both a left inverse and a right inverse). So I don't think we can reason consistently about 1/A for nonsquare matrices in a way that neglects order of operations.

@jiahao
Copy link
Member

jiahao commented Feb 14, 2014

I know I have tons of A + λ * eye(size(A)) littered through my Matlab code for state estimation

You could always define a custom +(A::Matrix, λ::Number) method in your code...

@mbauman
Copy link
Sponsor Member

mbauman commented Feb 14, 2014

Right, of course, since there is no true inverse for rectangular matrices, there is no universal solution here… perhaps the answer is to punt and throw an error for rectangular matrices? I still like the consistency of (effectively) transforming the scalar λ into λI across all basic arithmetic operations with matrices.

(Thanks for the tips, Kevin and Jiahao - I'll be happy to ditch those Matlabisms in my Julia code)

@jiahao
Copy link
Member

jiahao commented Feb 14, 2014

@kmsquire thanks, I've edited my comments.

@mbauman we wouldn't have to do anything new at the level of redefining 1/A, since if it calls inv(A), that would do the dimension check already.

@carlobaldassi
Copy link
Member

Also, speaking of customization, probably a special matrix type which just keeps a scalar and represents a matrix proportional to the identity of a variable size, together with a constant I associated to the value 1, wouldn't be too difficult to write. For example:

immutable IdentityMatrix{T<:Number}
    λ::T
end

const I = IdentityMatrix(1)

Base.(:*)(x::Number, M::IdentityMatrix) = IdentityMatrix(x*M.λ)
Base.(:+){T<:Number}(A::Matrix{T}, M::IdentityMatrix) = A + M.λ * eye(A)
Base.(:/){T<:Number}(M::IdentityMatrix, A::Matrix{T}) = M.λ * inv(A)

etc. (of course + should be written more efficiently).
With these definitions alone one already has:

julia> A = rand(2,2)
2x2 Array{Float64,2}:
 0.184807  0.557457
 0.342099  0.889467

julia> A + 2I
2x2 Array{Float64,2}:
 2.18481   0.557457
 0.342099  2.88947 

julia> I/A
2x2 Array{Float64,2}:
 -33.787   21.1754
  12.9948  -7.02  

@lindahua
Copy link
Contributor

A local variable I would hide the constant identity matrix I. So probably won't break existing codes. (Just like e is also being used as local variables/fields in many places).

If we add this constant I, it will only be used in new/revised codes. In such cases, I believe most people would avoid using I as a local variable whenever he wants to use I as an identity matrix within the same local context.

@milktrader
Copy link
Contributor

@lindahua thanks for the excellent explanation

@stevengj
Copy link
Member

I like the IdentityMatrix idea too.

One could then define I[n,m] as a Kronecker delta function as well.

However, I don't think IdentityMatrix is the right name for the type, since it includes a scale factor. Maybe ScalingOperator (since it could be an operator for arbitrary vector space, and in any case is not a <: AbstractMatrix since it doesn't have a size)

@carlobaldassi
Copy link
Member

I agree with @lindahua on everything, included the preference for deprecating α/I and forcing to be explicit on that.
But I'm actually a bit (not too much really, but still) concerned about using I as a constant; it's true it would not probably break any code, but a quick grepping in Julia base reveals it's very commonly used (480 hits), especially to indicate a group of indices. The best choice would probably be one of \U1D7D9 ('MATHEMATICAL DOUBLE-STRUCK DIGIT ONE') or \U1D540 ('MATHEMATICAL DOUBLE-STRUCK CAPITAL I'); maybe the (admittedly ugly) name II could be used to mimic that. Or, following im it could be called ID. So in Base code we would use Unicode, but users could still use ASCII without too much hassle.

@nalimilan
Copy link
Member

@carlobaldassi Unfortunately, the standard notation for the identity matrix really seems to be I. Even though 1 is used sometimes, it is also used to designate a vector/matrix of ones, so that's not ideal. And I with double strike does not seem to be used anywhere.

@StefanKarpinski
Copy link
Sponsor Member

Using I seems fine to me since it won't break any existing code. The same name means different things in different contexts. That's just normal and not really a big deal at all.

@andreasnoack
Copy link
Member

Sorry for the long break. I can see that you have had a great party here.

I am not proposing the following behaviour in julia, but just for completeness, here is Sage

A=matrix([[1,1],[1,1]])

A+2
[3 1]
[1 3]

I think it will be better to add the IdentityMatrix.

Generally, I think that it would be cleaner to let the . versions do the less-mathy elementwise operations and reserve +,-,*,/,^ for the math operations, i.e. make randn(2,2)+1 and error and point to .+. I don't see the benefit of sometimes having the non dot version default to the dot version.

Finally, even though we might keep + and - to default to elemenwise operations, I don't see why the fact that A^-1=inv(A) should imply that 1/A=inv(A). I think the Number/Array methods should go away except maybe for the case mentioned earlier, but that exception is not so important to me.

@JeffBezanson
Copy link
Sponsor Member

I'd be fine with removing Number/Array.

@lsorber
Copy link

lsorber commented Feb 15, 2014

+1 for @andreasnoackjensen's arguments

@toivoh
Copy link
Contributor

toivoh commented Feb 18, 2014

I want to put in my vote for letting matrix+scalar give an error, or be allowed only for square matrices where it would give matrix+scalar*I.

It was not always obvious that Julia should have both + and .+ operators. After all, + would typically just be a more restricted form of the broadcasting .+, so why not let + broadcast? The argument that finally swayed Stefan was that then * (matrix multiplication) distributes over + (non-broadcasting addition), while .* distributes over .+. This should definitely lead to fewer surprises.

Put another way, when you are writing linear algebra code using + and *, you really prefer an error when you try to add scalar+vector, scalar+matrix, or vector+matrix, rather than silently getting a broadcasted result. A lot of the strength of the + operator compared to .+ comes from the cases that it does not allow.

When it comes to being explicit about what operations do, I think that matrix+scalar working elementwise is really sneaky. If you want a broadcasting operation, be explicit about it and use .+. I think that a good way forward be to deprecate matrix+scalar, and make everyone use .+ instead.

If the distributive property should hold for matrix+scalar as well, it must evaluate to matrix+scalar*I, so that e.g.

matrix*vector + scalar*vector = (matrix+scalar)*vector = (matrix+scalar*I)*vector

But I think it would be ok if you have to put in the I yourself. Cases might come up where the above would be convenient, but it if we deprecate matrix+scalar, that can always be discussed later on.

For 1/A where A is a matrix, I want to appeal to associativity. If we want that

B*1/A = (B*1)/A = B*(1/A)

then we need 1/A = inv(A), which I think is the right choice. This also goes in line with letting + - * / be linear algebra operators.

A lot of you have talked about consistency with other languages. I beg of you to consider the internal consistency of Julia as well. For this, we need to consider mathematical arguments. When one gains experience with numerical software, it easy to be lead to believe that it is by necessity much less consistent than mathematics, with corner cases that you have to watch out for everywhere you turn. Experiencing Julia has led me to believe that it does not have be so, it's more that too many programming languages were designed without such considerations, often by people without sufficient mathematical background. I still hope that Julia can be different. I would argue that satisfying basic properties like distributivity and associativity is an important consistency aspect, and generic programming thrives on consistency.

@andreasnoack
Copy link
Member

@toivoh Thanks for the comment. After reading it twice, I am convinced by your arguments, and also annoyed that I didn't see the associativity argument for inv(A)=1/A.

@Jutho
Copy link
Contributor

Jutho commented Feb 25, 2014

It looks like a lot of the discussion is on the difference between interpreting an Array as just a multidimensional collection of data versus interpreting them as a mathematical object (vector,matrix, tensor = multilinear map).

My vote would also be on having number/matrix = number * inv(matrix), and on deprecating array + number and requesting users to change the code to array .+ number, despite @lindahua 's argument that this construction is often used in the technical computing community. I agree with @lindahua 's argument for not having matrix + number adding to the diagonal (cited below), but the same argument also requires not to have matrix + number = adding number everywhere. In fact, I cannot think of any algebraic construction that would ever want to use such a strange operation, but that's probably just my ignorance.

Citing @lindahua
"From a pure algebraic standpoint, the Matrix Ring is formed of all n x n square matrices together with the addition and matrix multiplication defined thereon. Perhaps, I am missing something. I have yet to see an algebraic system (in terms of mathematical concept) that involves addition between matrix and scalar that exhibit the A + λI behaviour."

@lindahua
Copy link
Contributor

I don't have a problem deprecating Array + Number in favor of Array .+ Number. I am actually against quietly changing the behavior/semantics in a radical way.

If I were asked to design a system, I would prefer such a system:

  • Using dot-prefixed operators .+, .-, .*, ./, .\, and .^ for element-wise operations.
  • Using +, -, *, \, and / for (linear) algebraic operations between arrays (considered as tensors)
  • A ^ n should be interpreted as algebraic power, i.e. A * A * A ....
  • Allow λ * A and A / λ for scalar product (this is also well defined in linear algebra)
  • adding scalars to diagonals should be written as A + λI. A + λ and A - λ not allowed (due to potential confusion).
  • λ * inv(A) may be written as λI / A. λ / A is not allowed.

@johnmyleswhite
Copy link
Member

+1 for that

@jiahao
Copy link
Member

jiahao commented Feb 25, 2014

+1 in favor of @lindahua's proposal.

1/A as matrix inverse is cute, but more likely symptomatic of someone who doesn't quite know what they're doing. And people really should avoid computing inverses explicitly anyway.

@StefanKarpinski
Copy link
Sponsor Member

Yes, that seems like a very sane approach. +1

@simleb
Copy link

simleb commented Feb 25, 2014

+1

@andreasnoack
Copy link
Member

I support @lindahua's proposal except that I think @toivoh's argument for allowing 1/A is convincing. Now that @jiahao raises the often heard warning against the inverse, I would like to ask you about Druinsky and Toledo. I have also made a plot.

@jiahao
Copy link
Member

jiahao commented Feb 25, 2014

You've caught me reciting scripture (* ̄m ̄)

It isn't so much that inv(A)*b is less accurate than A\b - that obviously depends on the actual algorithms used for inv and \ - , but rather that it is significantly slower in many practical use cases. Three in particular come to mind:

  1. For sparse matrices, computing inv(A) is often ill-advised, since it usually has a very different sparsity pattern from A itself, and is prone to fill-in. So even if A is sparse, inv(A) usually isn't.
  2. For singular matrices, you can graciously overload \ compute the least-squares solution A\b, which inv(A)*b won't give you - you'd need pinv(A)*b for that, which is significantly more expensive.
  3. For use cases with multiple rhs bs, it's faster to factorize once and solve with the factorization multiple times, as opposed to inverting the matrix once and computing multiple matvecs. The factorizations usually involve special matrix forms for which the matvecs would be faster.

You'll notice in your notebook that the spread of errors using A\b is narrower than for inv(A)*b. I think this is indicative of doing more flops in the latter than the former, and that you can have cancellation pushing the error lower, or componding error pushing the error higher. I don't think that is necessarily surprising.

I really can't think of a compelling use case for matrix inverses where the right to do is to express it using 1/As.

@kmsquire
Copy link
Member

You'll notice in your notebook that the spread of errors using A\b is narrower than for inv(A)*b

Actually, I think it's the other way around. LU errors are circles, INV errors are x's.

@jiahao
Copy link
Member

jiahao commented Feb 26, 2014

Oops, fail.

I'm not sure why that would be the case. I notice that inv and \ are both using the same LU factorization under the hood, is that right?

@toivoh
Copy link
Contributor

toivoh commented Feb 26, 2014

+1 to @lindahua's suggestion, explicit is better than implicit. My argument for 1/A = inv (A) should perhaps even more be taken as argument against elementwise division.

I think that we should consider what I really is though. I would see it as an (identity) scaling operator, something more general than an identity matrix with unknown size.

@Jutho
Copy link
Contributor

Jutho commented Feb 26, 2014

+1 for @lindahua's suggestion. I had no strong preference for 1/A being either inv(A) or just an error (but not elementwise division), and would definitely not use the 1/A notation myself.

@andreasnoack
Copy link
Member

I have updated the pull request #5810 in consequence of the discussion here.

@jiahao I understand that there are many situations where the \ version is preferred, but I think it interesting that the broadly communicated warning against the numeric accuracy of inv(A)*b doesn't really hold. I have updated the plot with relative residual/backward errors. It explains the warning: inv(A)*b is not backward stable in b, but it must be the error of the solution that matters. One of the cited papers points out that the only place in Higham's 14 pages on matrix inversion where the forward error of inv(A)*b is mentioned is in an exercise where you asked to derive and interpret the forward error. That paper also gives examples where use of inv(A)*b is faster because of repeated use inv(A) for changing bs. I can't tell if their examples are reasonable or their timings fair.

@stevengj
Copy link
Member

inv(A)*b, given inv(A), requires asymptotically the same number (2m² for an m×m matrix) of flops as the two backsubstitutions in U \ L \ b, given the LU factors of A. The practical constant factor in the real execution time of the former is slightly better because matrix-vector multiplication has greater regularity than backsubstitution, I think, and hence is easier to optimize. However, the difference only matters if you have a huge number of right hand sides ( >> m), as otherwise the time to compute the inverse dominates. But I agree that it is mostly a question of performance, not accuracy.

(In practice, I suspect that inverting matrices for performance reasons is mostly seen for tiny matrices, e.g. 3×3. On the other hand, for pedagogical and exploration purposes, I compute inverse matrices of moderate-sized matrices all the time.)

@JeffBezanson
Copy link
Sponsor Member

closed by #5810

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:arrays [a, r, r, a, y, s] domain:linear algebra Linear algebra kind:breaking This change will break code needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests