Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VkFFT v1.3.3 release #147

Merged
merged 26 commits into from
Jan 8, 2024
Merged

VkFFT v1.3.3 release #147

merged 26 commits into from
Jan 8, 2024

Conversation

DTolm
Copy link
Owner

@DTolm DTolm commented Jan 8, 2024

Multi-upload R2C and R2R algorithms
-This update removes the limit of ~2^12 for R2C and R2R systems - they can all now be done in up to three uploads with coverage ~2^32 for all dimensions, same as C2C.
-Added versions of all R2C and R2R algorithms, implemented as load/store callbacks. This functionality will be enhanced in the future to support arbitrary user callbacks (I just need to find out how this can be done for a multiple-API user interaction).
-Bugfixes

DTolm and others added 26 commits November 7, 2023 14:40
-This update removes the limit of ~2^12 for R2C and R2R systems - they can all now be done in up to three uploads with coverage ~2^32 for all dimensions, same as C2C.
-Added versions of all R2C and R2R algorithms, implementad as load/store callbacks. This functionality will be enchanced in the future to support arbitrary user callbacks (I just need to find out how this can be done for a multiple-API user-interaction).
-Restructured internal kernel typing enumeration.
… of how the type of these transforms is stroed in v1.3.2 (vincefn/pyvkfft#32)
- Fixed incorrect LUT referencing for inverse DCT/DST-II/III multi-upload algorithm
-Fixed incorrect R2C layout management for non-strided axes
- Fixed incorrect twiddle factors calculation for inverse DST-I
- Fixed incorrect twiddle factors calculation for inverse DST-I Bluestein transform
-Fixed inconsistent usage of shared memory in dct/dst-I Bluestein if original sequence is power of 2 and Bluestein sequence is not.
-Added boundary guard for DCT/DST-IV callback version.
-Added a weird boundary guard for DST-II/III that should do nothing but fixes out of nounds accesses in some strided cases.
-Fixed inconsistent usage of swapTo3Stage4Step for Bluestein sequences.
-Fixed inconsistent read check for second value in callback version
-Small Bluestein sequence handler improvement
-Disabled incorrect reuse of non-strided LUT for strided axes
-Also changed how dispatch of push constants works on CUDA and HIP backends (setting constants as argument should be better than address copy).
-Also fixed an issue when switch to logical workgroup count increase sometimes being not enabled during kernel compilation.
@DTolm DTolm merged commit d6f7ded into master Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants