-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster inv 4x4 #597
Faster inv 4x4 #597
Conversation
Try run the benchmarks with interpolating @btime inv($sm44) etc. julia> @btime estOverhead($sm44)
0.001 ns (0 allocations: 0 bytes) for example (gets completely optimized away) |
As does inv... sm44 = randn(SMatrix{4,4,Float64,16})
@btime inv($sm44) # 0.014 ns (0 allocations: 0 bytes) |
Oh, sometimes I use @btime inv($(Ref(sm44))[]) which usually works around it. |
Thanks I didn't realize this. Here is a corrected test: using StaticArrays
using LinearAlgebra
using ForwardDiff
using BenchmarkTools
### Before
sm44 = randn(SMatrix{4,4,Float64,16})
@btime inv($(Ref(sm44))[]) # 60.706 ns (0 allocations: 0 bytes)
sm44 = randn(SMatrix{4,4,ForwardDiff.Dual{Nothing,Float64,6},16})
@btime inv($(Ref(sm44))[]) # 518.232 ns (0 allocations: 0 bytes)
### After
sm44 = randn(SMatrix{4,4,Float64,16})
@btime inv($(Ref(sm44))[]) # 28.310 ns (0 allocations: 0 bytes)
sm44 = randn(SMatrix{4,4,ForwardDiff.Dual{Nothing,Float64,6},16})
@btime inv($(Ref(sm44))[]) # 346.906 ns (0 allocations: 0 bytes) |
Cool - thanks @ryanelandt. Out of curiosity, does anyone know anything about the numerical stability of this method? The referenced .pdf doesn't go into this. (Not that our static array methods have the highest level of stability, but we do make some attempt to trade of accuracy and performance). |
I did a test that compared the accuracy of this method and the old method. I do not know the right way to test two versions of a function so I copied both versions into functions called using StaticArrays
using LinearAlgebra
function eval_err(is_make_pos::Bool)
cum_err_old = 0.0
cum_err_new = 0.0
for k = 1:1e6
sm44 = randn(SMatrix{4,4,Float64,16})
is_make_pos && (sm44 = abs.(sm44))
cum_err_old += norm(sm44 * invert_old(sm44) - I)
cum_err_new += norm(sm44 * invert(sm44) - I)
end
println("")
println("cum_err_old: ", cum_err_old)
println("cum_err_new: ", cum_err_new)
println("new method is $(100 * (1 - cum_err_new / cum_err_old)) % more accurate than old method")
end
eval_err(false)
# cum_err_old: 5.442016380124342e-9
# cum_err_new: 4.801225758445484e-9
# new method is 11.774874916201883 % more accurate than old method
eval_err(true)
# cum_err_old: 1.4806311491135073e-8
# cum_err_new: 1.0418722581347007e-8
# new method is 29.633233857163077 % more accurate than old method |
Thanks - seems like a good start. |
CI/coverage failures seem unrelated. |
Using the additional stress test in |
Great work Chris. :) |
And thank you very much, @ryanelandt |
Yes, thanks @ryanelandt. It's interesting to note that the low rank numerical problems go away for the 5x5 case because we use the static LU decomposition there rather than special case code. Similarly, if we comment out the 4x4 special case entirely, we have vastly improved numerical stability due to using LU, but there's a 7x speed hit to pay on my machine for that extra accuracy. Yikes! julia> sm44 = randn(SMatrix{4,4,Float64,16})
# master, without 4x4 special case
julia> @btime inv($(Ref(sm44))[])
136.877 ns (0 allocations: 0 bytes)
# ...
# master after merging this PR
julia> @btime inv($(Ref(sm44))[])
18.941 ns (0 allocations: 0 bytes) |
Made the inverse function faster for 4x4 matrices. This should be roughly 100% faster for floats and 60% faster for duals than the current function.