Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for more transform directions #1903

Merged
merged 10 commits into from
Aug 29, 2023

Conversation

RainerHeintzmann
Copy link
Contributor

This is achieved by allowing fft-plans to have fewer dimensions than the data they are applied to. The trailing dimensions are treated as non-transform directions and transforms are executed sequentially. (maybe @inbounds should be added?).
This should allow for most use-cases to now work. Especially note, that single dimensions are now supported, which add flexibility.
Only rare cases such as (2,4) out of a 4-dimensional array are currently still not supported but the user would be able to execute these transforms sequentially.

@RainerHeintzmann
Copy link
Contributor Author

... removed allocations and added support for irfft(Real).

lib/cufft/fft.jl Outdated Show resolved Hide resolved
@maleadt
Copy link
Member

maleadt commented May 17, 2023

Thanks! I'm not really familiar with FFTs though, so maybe @stevengj (or @ali-ramadhan or @btmit from #119) could give this a quick review.

@stevengj
Copy link

stevengj commented May 17, 2023

The trailing dimensions are treated as non-transform directions and transforms are executed sequentially.

So you only support FFT-ing the leading dimensions? That is certainly the low-hanging fruit, since you then just have a single loop of contiguous FFTs (which could conceivably be done in parallel), though it's not super general.

This is achieved by allowing fft-plans to have fewer dimensions than the data they are applied to.

This deviates from the AbstractFFTs API. Why not just follow the standard API and accept a dims argument, but throw an error if dims are not the leading dimensions (i.e. require dims == 1:n where n ≤ ndims(X))?

@RainerHeintzmann
Copy link
Contributor Author

So you only support FFT-ing the leading dimensions? That is certainly the low-hanging fruit, since you then just have a single loop of contiguous FFTs (which could conceivably be done in parallel), though it's not super general.

Not quite. Supported are FFts with up to a total of three transform directions. These can have at most one gap of any number of non-transform directions. E.g. in 4D (XYZT) supported dimensions are X, Y, Z, T, XY, XZ, XT, YZ, ZT, XYZ, XYT, YZT. Not supported is only YT.
The reason for this current limitation is the low-level interface supported by CuFFT, which allows only for single strides.
In the future the remaining (very rarely used) cases can potentially be supported by calling sucessive in-place FFTs of the remaining dimensions. With the new interface any single dimension (e.g. Y in 3D) is now possible. This also makes it very easy for users to do the remaining dimension by hand.

This is achieved by allowing fft-plans to have fewer dimensions than the data they are applied to.

Indeed this is the core idea here. Yet the strides already support a single gap of N dimensions, which is also exploited. Cuda is able to run asynchronously possibly utilizes all kernels even if the transform size is smaller than the number of CUDA-kernels.

This deviates from the AbstractFFTs API. Why not just follow the standard API and accept a dims argument, but throw an error if dims are not the leading dimensions (i.e. require dims == 1:n where n ≤ ndims(X))?

See above. Many more cases are supported. The few unsupported cases do thow an error.

The general case is currently not implented, since this would need a complete revamping of the plan structure, which is more of a high-hanging fruit storing multiple CuFFT plans and transform orders in an encasulating plan. Also for those cases it may actually be better and faster for the users to use permutedims! and apply the transforms on consecutive dimensions.

@RainerHeintzmann
Copy link
Contributor Author

This deviates from the AbstractFFTs API. Why not just follow the standard API and accept a dims argument, but throw an error if dims are not the leading dimensions (i.e. require dims == 1:n where n ≤ ndims(X))?

@stevengj, you are describing here, what was already the case, when work started on the improvements to support more combinations of dimensions, most notably any individual dimension of ND-arrays. Of course the dims argument is accepted, which this whole pull request is mostly about apart from fixing two other existing problems (see linked issues).

Yes, there is a small difference in the interface compared to the standard API, as the plan dimension does not need to correspond wrt. trailing dimensions. This is in accordance to broadcasting rules where you can also multiply a .* b with b having more trailing dimensions. I agree that it would be nicer to also throw an error to be in 100% agreement with the standard interface regarding application of plans, though one could argue that it would be good to allow the standard interface to apply plans to trainling dimensions automatically without needing to change (and store) another plan. The trouble for the CuArray version implementation is that it, somewhat exploits the trailing dimension broadcasting ability. If wanted, this can probably be fixed, by throwing an error, keeping the internal workings, but it may need one more dimension property in the struct of the plan.

@RainerHeintzmann
Copy link
Contributor Author

The code is changed now such that plans have to always agree in dimensions and sizes to the data they are applied to. This should be in agreement with the Abstract FFT interface. The support for transform directions has not been changed and almost all directions should be working now. @stevengj Is this OK now?

@stevengj
Copy link

Shouldn't it throw an error if dims is an unsupported tuple? I'm not seeing any error-checking code, or am I missing it?

@RainerHeintzmann
Copy link
Contributor Author

I think it does

julia> q = fft(CuArray(rand(10,10,10,10)), (2,4));
ERROR: ArgumentError: batch regions must be sequential
Stacktrace:
  [1] cufftMakePlan(xtype::CUDA.CUFFT.cufftType_t, xdims::NTuple{4, Int64}, region::Tuple{Int64, Int64})
    @ CUDA.CUFFT C:\Users\pi96doc\Documents\Programming\Julia\Forks\CUDA.jl\lib\cufft\wrappers.jl:127
julia> q = fft(rand(10,10,10,10), (2,4));

julia> q = fft(CuArray(rand(10,10,10,10)), (2,4));
ERROR: ArgumentError: batch regions must be sequential
Stacktrace:
  [1] cufftMakePlan(xtype::CUDA.CUFFT.cufftType_t, xdims::NTuple{4, Int64}, region::Tuple{Int64, Int64})
    @ CUDA.CUFFT C:\Users\pi96doc\Documents\Programming\Julia\Forks\CUDA.jl\lib\cufft\wrappers.jl:127
julia> p = plan_fft(CuArray(rand(10,10,10,10)), (1,3))
CUFFT d.p. complex forward plan for 10×10×10×10 CuArray of ComplexF64

julia> q = p * CuArray(rand(10,10,10,10));

julia> q = p * CuArray(rand(10,10,10,10,10));
ERROR: ArgumentError: CuFFT plan applied to wrong-size input
Stacktrace:
 [1] assert_applicable(p::CUDA.CUFFT.cCuFFTPlan{ComplexF64, -1, false, 4}, X::CuArray{ComplexF64, 5, CUDA.Mem.DeviceBuffer})
   @ CUDA.CUFFT C:\Users\pi96doc\Documents\Programming\Julia\Forks\CUDA.jl\lib\cufft\fft.jl:305

@RainerHeintzmann
Copy link
Contributor Author

The text of the error message may be improved though upon? I kept the case and text from before for now.

@RainerHeintzmann
Copy link
Contributor Author

@stevengj , @maleadt do you think it can be merged now, see comments and discussion above?

@maleadt
Copy link
Member

maleadt commented Jul 10, 2023

do you think it can be merged now, see comments and discussion above?

I guess so, unless @stevengj has more comments.

CI failures are related though, as I noted above.

@RainerHeintzmann
Copy link
Contributor Author

@stevengj does not seem to have more comments, so can we merge? Would be great, as then we could finally use these new transform directions in other toolboxes.

@maleadt
Copy link
Member

maleadt commented Aug 28, 2023

CI failures are related though, as I noted above.

This has not changed, so no this cannot be merged yet. The CI failures that this PR causes need to be fixed first.

@RainerHeintzmann
Copy link
Contributor Author

What is the best way to do this? Last time, I was merging the new version into my pull request, and I remember that you wrote that this was a bad idea and caused you lots of work. Should I just copy the current version of the cufft.jl file and try to merge it by hand?

@maleadt
Copy link
Member

maleadt commented Aug 29, 2023

There's two things that need to happen:

@RainerHeintzmann
Copy link
Contributor Author

RainerHeintzmann commented Aug 29, 2023

Thanks. Do I get this right:

  1. fix the single-line commands and somehow merge the test/cufft.jl (see system complaint above) by hand.
  2. move the existing changes in my fork into a separate branch of my fork.
  3. somehow (how?) overwrite the main branch with the latest version from your repo.
  4. use git rebase master on my branch.
  5. hope that this pull request updates itself or do a new pull request
    ?

@maleadt
Copy link
Member

maleadt commented Aug 29, 2023

  • move the existing changes in my fork into a separate branch of my fork.

  • somehow (how?) overwrite the main branch with the latest version from your repo.

  • use git rebase master on my branch.

  • hope that this pull request updates itself or do a new pull request

I think you're overcomplicating things. On your master branch, which is the source of your PR, you should just fetch the changes from the main repo (depending on how you've set up your repository, i.e., at which remote the JuliaGPU sources are, that's doing git fetch origin), rebase on top of our master (git rebase origin/master) and push (git push). If you don't have the JuliaGPU/CUDA.jl repository configured as a remote, you do git remote add upstream https://github.com/JuliaGPU/CUDA.jl.git and replace origin to upstream in the commands I mentioned.
FYI, to be able to have multiple PRs, it's recommended not to base your PR from your master branch, but use a local feature branch too.

Anyway, I've rebased the PR for now, as that seemed faster :-)

@RainerHeintzmann
Copy link
Contributor Author

Heck. I think we were now both working on it at the same time. I did the rebase and pushed.
What now?

@maleadt
Copy link
Member

maleadt commented Aug 29, 2023

Yeah, you overwrote my changes. Yours seem fine too though, so let's just wait for CI to finish now.

@RainerHeintzmann
Copy link
Contributor Author

@RainerHeintzmann
Copy link
Contributor Author

Lets hope. My local checks seemed to have some problems though:

 Pkg.test("CUDA";test_args=["cufft","--quickfail"])
     Testing CUDA
┌ Warning: Could not use exact versions of packages in manifest, re-resolving
└ @ Pkg.Operations C:\Users\pi96doc\AppData\Local\Programs\Julia-1.9.2\share\julia\stdlib\v1.9\Pkg\src\Operations.jl:1809
ERROR: Unsatisfiable requirements detected for package GPUCompiler [61eb1bfa]:
 GPUCompiler [61eb1bfa] log:
 ├─possible versions are: 0.1.0-0.22.0 or uninstalled
 └─restricted to versions 0.23 by CUDA [052768ef] — no versions left
   └─CUDA [052768ef] log:
     ├─possible versions are: 4.4.0 or uninstalled
     └─CUDA [052768ef] is fixed to version 4.4.0

@maleadt
Copy link
Member

maleadt commented Aug 29, 2023

Make sure you use the Manifest committed in the CUDA.jl repository, as we're currently (temporarily) depending on an unreleased version of GPUCompiler.

@RainerHeintzmann
Copy link
Contributor Author

It failed. Strange, this type piracy problem, which seems not directly connected to the fft stuff.

@maleadt
Copy link
Member

maleadt commented Aug 29, 2023

Aqua 0.7 was just released, it probably contains additional checks.

@RainerHeintzmann
Copy link
Contributor Author

Can I do something about it, or leave it up to you?

@RainerHeintzmann
Copy link
Contributor Author

RainerHeintzmann commented Aug 29, 2023

I tried it with the Manifest.toml which is on the main repo. But this yields (Windows 10) quite severe problems.

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x6be45c40 -- nvtxGlobals_v3 at C:\Users\pi96doc\.julia\artifacts\b4eeaf094ffb6aacf1b20ee5d2ac9aa1818fc732\bin\libnvToolsExt.dll (unknown line)
in expression starting at C:\Users\pi96doc\Nextcloud-Uni\Julia\Forks\CUDA.jl\test\setup.jl:1
nvtxGlobals_v3 at C:\Users\pi96doc\.julia\artifacts\b4eeaf094ffb6aacf1b20ee5d2ac9aa1818fc732\bin\libnvToolsExt.dll (unknown line)
Allocations: 5080540 (Pool: 5079127; Big: 1413); GC: 8
ERROR: Package CUDA errored during testing
Stacktrace:
 [1] pkgerror(msg::String)
   @ Pkg.Types C:\Users\pi96doc\AppData\Local\Programs\Julia-1.9.2\share\julia\stdlib\v1.9\Pkg\src\Types.jl:69
 [2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
   @ Pkg.Operations C:\Users\pi96doc\AppData\Local\Programs\Julia-1.9.2\share\julia\stdlib\v1.9\Pkg\src\Operations.jl:2021
 [3] test
   @ C:\Users\pi96doc\AppData\Local\Programs\Julia-1.9.2\share\julia\stdlib\v1.9\Pkg\src\Operations.jl:1902 [inlined

yet there also seem to be some disagreeements with the Project.toml, if I type Pkg.resolve().

@maleadt
Copy link
Member

maleadt commented Aug 29, 2023

Ah, that's bad. Does just importing CUDA.jl (but using that manifest) fail with the same error?

@maleadt maleadt merged commit e86b165 into JuliaGPU:master Aug 29, 2023
1 check passed
@RainerHeintzmann
Copy link
Contributor Author

Bad news: I freshly cloned the merged repo and just typing
using CUDA
leads to a complete crash of the Julia REPL. (Julia 1.9.2)

(CUDATest) pkg> st
Status `C:\Users\pi96doc\Nextcloud-Uni\Julia\Development\TestBeds\CUDATest\Project.toml`
  [052768ef] CUDA v4.4.0 `https://github.com/JuliaGPU/CUDA.jl.git#master`
  [7a1cc6ca] FFTW v1.7.1

Any ideas?

@maleadt
Copy link
Member

maleadt commented Aug 29, 2023

I can reproduce. I'll look into it.

@RainerHeintzmann
Copy link
Contributor Author

Great! Now it works also on my system and also the new transform directions are supported. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants