Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster inv 4x4 #597

Merged
merged 2 commits into from
Mar 19, 2019
Merged

Faster inv 4x4 #597

merged 2 commits into from
Mar 19, 2019

Conversation

ryanelandt
Copy link

@ryanelandt ryanelandt commented Mar 18, 2019

Made the inverse function faster for 4x4 matrices. This should be roughly 100% faster for floats and 60% faster for duals than the current function.

using StaticArrays
using LinearAlgebra
using ForwardDiff
using BenchmarkTools

estOverhead(A::SMatrix{4,4,T,16}) where {T} = nothing

### Before
sm44 = randn(SMatrix{4,4,Float64,16})
@btime inv(sm44)          #  46.160 ns (1 allocation: 144 bytes)
@btime estOverhead(sm44)  #   7.740 ns (0 allocations: 0 bytes)

sm44 = randn(SMatrix{4,4,ForwardDiff.Dual{Nothing,Float64,6},16})
@btime inv(sm44)          # 357.433 ns (1 allocation: 1008 bytes)
@btime estOverhead(sm44)  #   8.376 ns (0 allocations: 0 bytes)

### After 
sm44 = randn(SMatrix{4,4,Float64,16})
@btime inv(sm44)          #  25.904 ns (1 allocation: 144 bytes)
@btime estOverhead(sm44)  #   7.740 ns (0 allocations: 0 bytes)

sm44 = randn(SMatrix{4,4,ForwardDiff.Dual{Nothing,Float64,6},16})
@btime inv(sm44)          # 220.067 ns (1 allocation: 1008 bytes)
@btime estOverhead(sm44)  #   8.376 ns (0 allocations: 0 bytes)

@KristofferC
Copy link
Contributor

KristofferC commented Mar 18, 2019

Try run the benchmarks with interpolating

@btime inv($sm44)

etc.

julia> @btime estOverhead($sm44)
  0.001 ns (0 allocations: 0 bytes)

for example (gets completely optimized away)

@ryanelandt
Copy link
Author

As does inv...

sm44 = randn(SMatrix{4,4,Float64,16})
@btime inv($sm44)          #   0.014 ns (0 allocations: 0 bytes)

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 65.722% when pulling 4bbcf7b on ryanelandt:faster_inv_4x4 into 2c8d82f on JuliaArrays:master.

@KristofferC
Copy link
Contributor

Oh, sometimes I use

@btime inv($(Ref(sm44))[])

which usually works around it.

@ryanelandt
Copy link
Author

Thanks I didn't realize this. Here is a corrected test:

using StaticArrays
using LinearAlgebra
using ForwardDiff
using BenchmarkTools

### Before
sm44 = randn(SMatrix{4,4,Float64,16})
@btime inv($(Ref(sm44))[])     #  60.706 ns (0 allocations: 0 bytes)

sm44 = randn(SMatrix{4,4,ForwardDiff.Dual{Nothing,Float64,6},16})
@btime inv($(Ref(sm44))[])     #  518.232 ns (0 allocations: 0 bytes)

### After
sm44 = randn(SMatrix{4,4,Float64,16})
@btime inv($(Ref(sm44))[])  #  28.310 ns (0 allocations: 0 bytes)

sm44 = randn(SMatrix{4,4,ForwardDiff.Dual{Nothing,Float64,6},16})
@btime inv($(Ref(sm44))[])  #  346.906 ns (0 allocations: 0 bytes)

@andyferris
Copy link
Member

Cool - thanks @ryanelandt.

Out of curiosity, does anyone know anything about the numerical stability of this method? The referenced .pdf doesn't go into this. (Not that our static array methods have the highest level of stability, but we do make some attempt to trade of accuracy and performance).

@ryanelandt
Copy link
Author

I did a test that compared the accuracy of this method and the old method. I do not know the right way to test two versions of a function so I copied both versions into functions called invert and invert_old and did a test this way. The new function has roughly 10% less error for random matrices and 25% less error for positive matrices. Both solutions are essentially identical, the old one is an expansion of the new one. My guess is that as the new one involves less floating point operations it induces less error.

using StaticArrays
using LinearAlgebra

function eval_err(is_make_pos::Bool)
	cum_err_old = 0.0
	cum_err_new = 0.0
	for k = 1:1e6
		sm44 = randn(SMatrix{4,4,Float64,16})
		is_make_pos && (sm44 = abs.(sm44))
		cum_err_old += norm(sm44 * invert_old(sm44) - I)
		cum_err_new += norm(sm44 * invert(sm44)     - I)
	end
	println("")
	println("cum_err_old: ", cum_err_old)
	println("cum_err_new: ", cum_err_new)
	println("new method is $(100 * (1 - cum_err_new / cum_err_old)) % more accurate than old method")
end


eval_err(false)
# cum_err_old: 5.442016380124342e-9
# cum_err_new: 4.801225758445484e-9
# new method is 11.774874916201883 % more accurate than old method

eval_err(true)
# cum_err_old: 1.4806311491135073e-8
# cum_err_new: 1.0418722581347007e-8
# new method is 29.633233857163077 % more accurate than old method

@andyferris
Copy link
Member

Thanks - seems like a good start.

@c42f
Copy link
Member

c42f commented Mar 19, 2019

CI/coverage failures seem unrelated.

@c42f
Copy link
Member

c42f commented Mar 19, 2019

Using the additional stress test in test/inv.jl (https://github.com/JuliaArrays/StaticArrays.jl/blob/master/test/inv.jl#L102) shows this is an improvement for "almost rank 1" 4x4 matrices. For other approximately low rank cases it fares similarly to the code on master.

master:
master

this pr:
pr_597

@c42f c42f merged commit 1376c77 into JuliaArrays:master Mar 19, 2019
@andyferris
Copy link
Member

Great work Chris. :)

@andyferris
Copy link
Member

And thank you very much, @ryanelandt

@c42f
Copy link
Member

c42f commented Mar 19, 2019

Yes, thanks @ryanelandt.

It's interesting to note that the low rank numerical problems go away for the 5x5 case because we use the static LU decomposition there rather than special case code.

Similarly, if we comment out the 4x4 special case entirely, we have vastly improved numerical stability due to using LU, but there's a 7x speed hit to pay on my machine for that extra accuracy. Yikes!

julia> sm44 = randn(SMatrix{4,4,Float64,16})

# master, without 4x4 special case
julia> @btime inv($(Ref(sm44))[])
  136.877 ns (0 allocations: 0 bytes)

# ...
# master after merging this PR
julia> @btime inv($(Ref(sm44))[])
  18.941 ns (0 allocations: 0 bytes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants