Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change type restrictions in cuTENSOR operations #2356

Merged
merged 2 commits into from
May 3, 2024

Conversation

lkdvos
Copy link
Contributor

@lkdvos lkdvos commented Apr 29, 2024

This PR migrates the type restrictions for the cutensor operations from the definitions of *operation*_execute! and plan_*operation* to the constructor of the CuTensorDescriptor.
This implementation makes it such that custom types, which wrap CuArray objects can specialize just the CuTensorDescriptor constructor while still being able to re-use most of the codebase.

Some questions/remarks I am still having:

Note that the current implementation changes the CuTensorDescriptor function quite drastically, in order to expose the internal constructor. I could think of alternative ways to achieve this goal as well by renaming the inner constructor to _CuTensorDescriptor, which handles only the finalizer etc.

This change allows cuTENSOR to work almost trivially with StridedCuArrays as well. I played around with this a bit, and the only issue that shows up by changing DenseCuArray to StridedCuArray is that the data alignment can no longer be guaranteed. I have to admit that I do not have enough experience with this myself, but simply changing alignment=sizeof(eltype(A)) seems to work. Does anyone know if there are any issues associated with this approach? Should I be mindful of severe performance pitfalls?
I have not added this because of these reasons, but if anyone can reassure me that this is fine, I can open up a different PR for implementing StridedCuArray as well.

Any suggestions are definitely more than welcome.

@maleadt
Copy link
Member

maleadt commented May 3, 2024

Note that the current implementation changes the CuTensorDescriptor function quite drastically, in order to expose the internal constructor.

It's simple enough of a change that we should carry this to enable better integration with external packages.

simply changing alignment=sizeof(eltype(A))

That's an underestimation, which could hurt performance. It is probably also better to use Base.datatype_alignment for the minimal guaranteed alignment. However, for performance it's better to track the alignment of the buffer from where it was allocated, or determine it at run time. CUDA.jl doesn't currently do this either, which could result in misaligned inputs since we hard-code an alignment of 128 bytes right now (e.g., when using a view and passing it to cuTENSOR).

Let me know if you still want to make changes to this PR, otherwise I'd be OK with merging it as is.

@lkdvos
Copy link
Contributor Author

lkdvos commented May 3, 2024

For me this is good to go, I am going to have to experiment with the alignment anyways for the StridedView implementation, but this is now accessible through the CuTensorDescriptor constructor so that should work.
Thanks!

@maleadt maleadt merged commit 754bb6c into JuliaGPU:master May 3, 2024
1 check passed
@lkdvos lkdvos deleted the cutensor-descriptor branch May 3, 2024 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants