Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU compatibility #142

Open
maximilian-gelbrecht opened this issue Apr 29, 2021 · 3 comments
Open

GPU compatibility #142

maximilian-gelbrecht opened this issue Apr 29, 2021 · 3 comments

Comments

@maximilian-gelbrecht
Copy link

I wonder if it would be realistic and/or a goal to make the library GPU compatible. With that I mean only the crucial part of applying a pre-computed plan to a CUDA / CU Array.

This is probably a bit tricky in the c library. While the FFTW parts could probably be bound to the appropiate CUDA implementations (there is cuFFT), it would need adjustments for the other plans. Personally I have no experience with CUDA in C, but in Julia a bit and looked at the old pure version of the SH plans and it seemed at least plausible that this would be doable there, but maybe I also overlooked something.

@MikaelSlevinsky
Copy link
Member

I think this would be interesting. As CUDA dropped support for macOS, that stymies my involvement. (I guess I would by default be in favour of a different language like OpenCL or Metal.)

The C library uses real-to-real ffts, which are not supported in CuFFT (nor MKL for that matter!) (https://forums.developer.nvidia.com/t/newbie-to-cufft-how-to-do-real-to-real-transforms/69952), but those are simply graceful for the programmer and workarounds could be found.

I also think that GPU computations are best performed synchronously with the same amount of work across all threads, which is not always the case here.

FYI, SHTns is supposed to work on the GPU https://bitbucket.org/nschaeff/shtns/src/master/

@maximilian-gelbrecht
Copy link
Author

Thanks for the quick response. I think the argument for GPU computations here is also not only the speed up itself (which seems to be there, when I look at the SHTns benchmark, thank you for the link), but also when the transform is used in high-dimensional PDEs or ML models that can profit from GPUs massively. The overhead of transferring memory back and forward between the two memories would probably be quite costly.

CUDA seems to be the best supported GPU API for Julia that was why I was assuming it. (I am also developing on macOS normally but luckily I have access to a HPC with some nVidia cards).

I'll definitely will keep an eye open for it.

@AshtonSBradley
Copy link

so if there is a pure julia implementation, that could easily be put on a GPU to get big gains for e.g. fft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants