CuSparseArrayCSR (N dim array) with batched matmatmul (bmm) #1944

nikopj · 2023-06-10T01:27:42Z

I've implemented a small working example of a 3-dimensional sparse array, CuSparseArrayCSR, which can be thought of as multiple CuSparseMatrixCSR stacked into a 3rd (batch) dimension. The only restriction I'm aware of is that the number of non-zeros of each matrix slices (batch-element) must be the same. The benefit of this representation is that we can more easily make use of CUDA's batched sparse mat muls, ex. Ci = Ai * Bi, which i've implemented for sparse * dense batched matmul in lib/cusparse/generic.jl (see bmm!). Example uses are in test/cusparse/bmm.jl.

I followed the cuSPARSE docs and the nvidia sample code for batched spmm. Based on the docs, I think this can be extended to CSC and COO representations, as well as different mat-mul cases. First, I'd like to get some feedback to see if this implementation is sensible.

This would be helpful for some neural network training cases. Similar capabilities are available in Pytorch, though I think they're implementation is more restrictive by not allowing different sparsity patterns between batch-elements.

Thanks,
Nikola

maleadt · 2023-06-13T10:30:51Z

Interesting! cc @amontoison

The only restriction I'm aware of is that the number of non-zeros of each matrix slices (batch-element) must be the same. The benefit of this representation is that we can more easily make use of CUDA's batched sparse mat muls, ex. Ci = Ai * Bi, which i've implemented for sparse * dense batched matmul in lib/cusparse/generic.jl (see bmm!).

If the representation isn't a fully generic 3D SparseArray, and the main use case is batched operations, why not a hypothetical Batched{CuSparseMatrix}? I'm concerned that users may think they can do more with a CuSparseArray type than they actually can.

nikopj · 2023-06-13T20:20:45Z

If the representation isn't a fully generic 3D SparseArray, and the main use case is batched operations, why not a hypothetical Batched{CuSparseMatrix}? I'm concerned that users may think they can do more with a CuSparseArray type than they actually can.

Sure, that makes sense to me. Let me know if I'm understanding you correctly with this below,

mutable struct Batched{<: CuSparseMatrixCSR, Tv, Ti} <: AbstractCuSparseArray{Tv, Ti, 3}
    rowPtr::CuMatrix{Ti}
    colVal::CuMatrix{Ti}
    nzVal::CuMatrix{Tv}
    dims::NTuple{3,Int}
    nnz::Ti

    function Batched{A, Tv, Ti}(rowPtr::CuMatrix{<:Integer}, colVal::CuMatrix{<:Integer}, nzVal::CuMatrix, dims::NTuple{3,<:Integer}) where {Tv, Ti<:Integer}
        new{A, Tv, Ti}(rowPtr, colVal, nzVal, dims, length(nzVal))
    end
end

amontoison · 2023-06-19T21:06:56Z

@nikopj Nice!
I like you proposition with the Batched structure.
We should be able to use it with different sparse matrices.

…opj-spbatched

lib/cusparse/array.jl

nikopj · 2023-11-10T03:54:39Z

Getting this going again.

I've made things more general by allowing the "batch" dimension of the CuSparseArrayCSR{Tv,Ti,N} to be several dimensions (N-2 batch dims). I'm motiavted to do this by some deep-learning sparse attention stuff, where we might have different sparse attention matrices per mini-batch element, and each mini-batch element might want to make use of several separate attention matrices (ex. multi-head self attention).

So N=3 is the same case as before, but N>3 is now also possible. I've made it such that batched sparse-dense matmul also works for N>3 if the sizes make sense. See the end of test/libraries/cusparse/bmm.jl for an example size (m,n,2,3) CuSparseArrayCSR.

Because of the extended dimensions, I'm in favor of keeping the original naming convention. It also makes sense to keep things specific to the sparsity type as that dictates the fields of the struct. If we end up implementing a similar type for CSC, COO, etc, we could do a "Batched" union type for all of them.

The printing / showing / indexing of CuSparseArrayCSR is also working better, making testing on the REPL easier.

Base.cat and Base.reshape are also working for sensible arguments.

maleadt · 2024-01-02T09:15:40Z

CI failures look related.

nikopj · 2024-01-03T22:56:55Z

@maleadt

CI failures look related.

It's failing on CUDA 11.4 and 11.5, + the julia nightly build (unrelated to cuSparse).
I'm not sure where the error lies in these versions as they API says it can handle batched sparse-dense mul.

A quick fix would be to only define bmm! for versions 11.6 and higher.

maleadt · 2024-01-04T18:26:42Z

A quick fix would be to only define bmm! for versions 11.6 and higher.

At the very least, yes. You could make the tests check for the cuSPARSE version (see the start of the CI logs), and maybe even add an error to the function itself.

The nightly issue is unrelated indeed.

maleadt · 2024-01-17T14:23:47Z

According to Aqua, this added a couple of ambiguities. Not sure why CI didn't spot those...

julia> ambs = Aqua.detect_ambiguities(CUDA; recursive=true)
 (kwcall(::NamedTuple, ::typeof(cat), As::CUDA.CUSPARSE.CuSparseMatrixCSR...) @ CUDA.CUSPARSE ~/Julia/pkg/CUDA/lib/cusparse/batched.jl:1, kwcall(::NamedTuple, ::typeof(cat), As::CUDA.CUSPARSE.CuSparseArrayCSR...) @ CUDA.CUSPARSE ~/Julia/pkg/CUDA/lib/cusparse/batched.jl:14)
 (cat(As::CUDA.CUSPARSE.CuSparseMatrixCSR...; dims) @ CUDA.CUSPARSE ~/Julia/pkg/CUDA/lib/cusparse/batched.jl:1, cat(As::CUDA.CUSPARSE.CuSparseArrayCSR...; dims) @ CUDA.CUSPARSE ~/Julia/pkg/CUDA/lib/cusparse/batched.jl:14)

Could you fix those?

nikopj · 2024-01-17T17:07:20Z

Ok I'll take a look.

CuSparseArrayCSR (3 dim array) with batch matmatmul (bmm)

34a6524

maleadt added enhancement New feature or request cuda array Stuff about CuArray. labels Jun 13, 2023

Janjusevic and others added 10 commits July 12, 2023 13:10

arraycsr update

40cd6f7

Merge branch 'master' into nikopj-spbatched

b6a1763

svdjBatched

260092c

svdaStridedBatched

c8a867d

tests added

9845cba

merge with master

5c21640

Merge remote-tracking branch 'origin' into nikopj-spbatched

45d1990

Merge branch 'nikopj-bsvd' into nikopj-spbatched

5e6cbd2

gesvdjBatched doesn't use econ

32ecab0

Merge branch 'nikopj-bsvd' into nikopj-spbatched

a6aafdf

maleadt force-pushed the master branch from c97bc77 to d57e020 Compare September 8, 2023 20:12

nikopj and others added 2 commits September 9, 2023 13:16

Merge branch 'JuliaGPU:master' into nikopj-spbatched

d746f47

Merge branch 'nikopj-spbatched' of github.com:nikopj/CUDA.jl into nik…

6f14394

…opj-spbatched

maleadt force-pushed the master branch from 1cb1f53 to 1a1d127 Compare September 18, 2023 16:28

Merge branch 'JuliaGPU:master' into nikopj-spbatched

28b6227

amontoison reviewed Sep 22, 2023

View reviewed changes

lib/cusparse/array.jl Outdated Show resolved Hide resolved

Janjusevic and others added 2 commits November 9, 2023 16:28

Merge branch 'master' into nikopj-spbatched

4e0848a

CuSparseArrayCSR with more than 1 batch-dimension

de49bbd

nikopj changed the title ~~CuSparseArrayCSR (3 dim array) with batched matmatmul (bmm)~~ CuSparseArrayCSR (N dim array) with batched matmatmul (bmm) Nov 10, 2023

typo fix

586f6bf

nikopj marked this pull request as ready for review November 10, 2023 03:56

nikopj and others added 2 commits November 17, 2023 16:44

array fixes and debugging

0767cdb

Merge branch 'JuliaGPU:master' into nikopj-spbatched

d193c03

nikopj and others added 2 commits November 21, 2023 11:37

complex tests added

255ea2e

Merge branch 'JuliaGPU:master' into nikopj-spbatched

551ed46

nikopj requested a review from amontoison November 23, 2023 00:59

Merge branch 'JuliaGPU:master' into nikopj-spbatched

94d6bdd

Merge branch 'JuliaGPU:master' into nikopj-spbatched

d3ff101

nikopj and others added 2 commits January 16, 2024 15:11

Merge branch 'JuliaGPU:master' into nikopj-spbatched

a8a625e

bmm! CUSPARSE version check

afb450d

maleadt merged commit 88ebe50 into JuliaGPU:master Jan 17, 2024
1 check passed

This was referenced Jan 17, 2024

CuSparseArrayCSR (fix ambiguities of #1944) #2243

Closed

CuSparseArrayCSR (fixed cat ambiguitites from #1944) #2244

Merged

maleadt pushed a commit that referenced this pull request Jan 18, 2024

Fixed ambiguitites from #1944 (#2244)

d4b766b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CuSparseArrayCSR (N dim array) with batched matmatmul (bmm) #1944

CuSparseArrayCSR (N dim array) with batched matmatmul (bmm) #1944

nikopj commented Jun 10, 2023

maleadt commented Jun 13, 2023

nikopj commented Jun 13, 2023

amontoison commented Jun 19, 2023 •

edited

Loading

nikopj commented Nov 10, 2023

maleadt commented Jan 2, 2024

nikopj commented Jan 3, 2024

maleadt commented Jan 4, 2024

maleadt commented Jan 17, 2024

nikopj commented Jan 17, 2024

CuSparseArrayCSR (N dim array) with batched matmatmul (bmm) #1944

CuSparseArrayCSR (N dim array) with batched matmatmul (bmm) #1944

Conversation

nikopj commented Jun 10, 2023

maleadt commented Jun 13, 2023

nikopj commented Jun 13, 2023

amontoison commented Jun 19, 2023 • edited Loading

nikopj commented Nov 10, 2023

maleadt commented Jan 2, 2024

nikopj commented Jan 3, 2024

maleadt commented Jan 4, 2024

maleadt commented Jan 17, 2024

nikopj commented Jan 17, 2024

amontoison commented Jun 19, 2023 •

edited

Loading