Change type restrictions in cuTENSOR operations #2356

lkdvos · 2024-04-29T09:53:16Z

This PR migrates the type restrictions for the cutensor operations from the definitions of *operation*_execute! and plan_*operation* to the constructor of the CuTensorDescriptor.
This implementation makes it such that custom types, which wrap CuArray objects can specialize just the CuTensorDescriptor constructor while still being able to re-use most of the codebase.

Some questions/remarks I am still having:

Note that the current implementation changes the CuTensorDescriptor function quite drastically, in order to expose the internal constructor. I could think of alternative ways to achieve this goal as well by renaming the inner constructor to _CuTensorDescriptor, which handles only the finalizer etc.

This change allows cuTENSOR to work almost trivially with StridedCuArrays as well. I played around with this a bit, and the only issue that shows up by changing DenseCuArray to StridedCuArray is that the data alignment can no longer be guaranteed. I have to admit that I do not have enough experience with this myself, but simply changing alignment=sizeof(eltype(A)) seems to work. Does anyone know if there are any issues associated with this approach? Should I be mindful of severe performance pitfalls?
I have not added this because of these reasons, but if anyone can reassure me that this is fine, I can open up a different PR for implementing StridedCuArray as well.

Any suggestions are definitely more than welcome.

maleadt · 2024-05-03T08:32:50Z

Note that the current implementation changes the CuTensorDescriptor function quite drastically, in order to expose the internal constructor.

It's simple enough of a change that we should carry this to enable better integration with external packages.

simply changing alignment=sizeof(eltype(A))

That's an underestimation, which could hurt performance. It is probably also better to use Base.datatype_alignment for the minimal guaranteed alignment. However, for performance it's better to track the alignment of the buffer from where it was allocated, or determine it at run time. CUDA.jl doesn't currently do this either, which could result in misaligned inputs since we hard-code an alignment of 128 bytes right now (e.g., when using a view and passing it to cuTENSOR).

Let me know if you still want to make changes to this PR, otherwise I'd be OK with merging it as is.

lkdvos · 2024-05-03T08:38:54Z

For me this is good to go, I am going to have to experiment with the alignment anyways for the StridedView implementation, but this is now accessible through the CuTensorDescriptor constructor so that should work.
Thanks!

lkdvos added 2 commits April 29, 2024 09:13

Refactor CuTensorDescriptor and restrict to DenseCuArray

1bd962f

Change cuTENSOR type restrictions to AbstractArray

2fef2cf

This was referenced Apr 29, 2024

Update to cuTENSOR 2.0 Jutho/TensorOperations.jl#160

Merged

[NDTensors] cuTENSOR extension ITensor/ITensors.jl#1395

Merged

maleadt merged commit 754bb6c into JuliaGPU:master May 3, 2024
1 check passed

lkdvos deleted the cutensor-descriptor branch May 3, 2024 08:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change type restrictions in cuTENSOR operations #2356

Change type restrictions in cuTENSOR operations #2356

lkdvos commented Apr 29, 2024

maleadt commented May 3, 2024 •

edited

Loading

lkdvos commented May 3, 2024

Change type restrictions in cuTENSOR operations #2356

Change type restrictions in cuTENSOR operations #2356

Conversation

lkdvos commented Apr 29, 2024

maleadt commented May 3, 2024 • edited Loading

lkdvos commented May 3, 2024

maleadt commented May 3, 2024 •

edited

Loading