Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support applying plans by column? (Or is it really 300x slower??) #216

Closed
dlfivefifty opened this issue Sep 28, 2021 · 4 comments
Closed

Comments

@dlfivefifty
Copy link
Member

I was surprised to learn two thing:

  1. a vector plan doesn't work on matrices:
julia> n = 10; F = FFTW.plan_r2r!(Vector{Float64}(undef,n), FFTW.REDFT00, 1);

julia> X = randn(n,1000);

julia> F * X
ERROR: ArgumentError: FFTW plan applied to wrong-size array
Stacktrace:
 [1] assert_applicable(p::FFTW.r2rFFTWPlan{Float64, (3,), true, 1, Int64}, X::Matrix{Float64})
   @ FFTW ~/.julia/packages/FFTW/pHa9y/src/fft.jl:426
 [2] *(p::FFTW.r2rFFTWPlan{Float64, (3,), true, 1, Int64}, x::Matrix{Float64})
   @ FFTW ~/.julia/packages/FFTW/pHa9y/src/fft.jl:889
 [3] top-level scope
   @ REPL[72]:1
  1. applying the plan column-by-column is 300x slower than the matrix plan (the copy is to avoid NaN possibly being special cased):
julia> M = FFTW.plan_r2r!(Array{Float64}(undef,n,1000), FFTW.REDFT00, 1);

julia> @btime M*copy(X);
  70.895 μs (2 allocations: 78.20 KiB)

julia> function plmul!(F, X)
       @inbounds for j in 1:size(X,2)
       F * view(X,:,j)
       end
       X
       end
plmul! (generic function with 1 method)

julia> @btime plmul!(F, copy(X));
  22.529 ms (2 allocations: 78.20 KiB)

So originally I was going to propose adding support to applying a vector plan to a matrix, but now I'm worried it's too slow. Though the scale of the slow-ness seems unbelievable! So perhaps I've just done something wrong.

@stevengj
Copy link
Member

Internally, FFTW doesn’t actually have an in-place DCT algorithm IIRC, so it is allocating a temporary buffer each time you execute the plan in order to perform the transform out-of-place and copy back the result. With the matrix plan it does this only once, but with your plmul! it is calling malloc once per column, which is pretty slow compared to the cost of the transform for n=10.

Of course, you could use an out-of-place plan and allocate the buffers yourself.

@dlfivefifty
Copy link
Member Author

TFW you've been using in-place transforms to be "non-allocating" when I could should have been using out-of-place transforms 🤦‍♂️

Can confirm it's a more reasonable 3x slowdown (due to better multithreading??)

julia> n = 10; F = FFTW.plan_r2r(Vector{Float64}(undef,n), FFTW.REDFT00, 1);

julia> function plmul!(Y, F, X)
              @inbounds for j in 1:size(X,2)
              mul!(view(Y,:,j), F, view(X,:,j))
              end
              Y
           end
plmul! (generic function with 1 method)

julia> Y = randn(n,1000); X = randn(n,1000); @btime plmul!(Y, F, copy(X));
  277.529 μs (2 allocations: 78.20 KiB)

Any thoughts on whether F * X should be supported? Or perhaps 3x slowdown is reason enough to not support it to point users to the "right" way?

@stevengj
Copy link
Member

It's not threading by default. Probably the slowdown is because FFTW's matrix in-place plan is only allocating a buffer big enough to hold a single column, and is re-using that for all of the columns?

@dlfivefifty
Copy link
Member Author

What I was thinking is not actually possible in practice since the memory alignment may be off:

julia> mul!(view(randn(5,2),:,2), FFTW.plan_r2r(Vector{Float64}(undef,5), FFTW.REDFT10), view(randn(5,2),:,2))
ERROR: ArgumentError: FFTW plan applied to array with wrong memory alignment
Stacktrace:
 [1] assert_applicable(p::FFTW.r2rFFTWPlan{Float64, (5,), false, 1, UnitRange{Int64}}, X::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
   @ FFTW ~/.julia/packages/FFTW/pHa9y/src/fft.jl:430
 [2] assert_applicable(p::FFTW.r2rFFTWPlan{Float64, (5,), false, 1, UnitRange{Int64}}, X::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, Y::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
   @ FFTW ~/.julia/packages/FFTW/pHa9y/src/fft.jl:435
 [3] mul!(y::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, p::FFTW.r2rFFTWPlan{Float64, (5,), false, 1, UnitRange{Int64}}, x::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
   @ FFTW ~/.julia/packages/FFTW/pHa9y/src/fft.jl:876
 [4] top-level scope
   @ REPL[13]:1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants