-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CuSparseArrayCSR (N dim array) with batched matmatmul (bmm) #1944
Conversation
Interesting! cc @amontoison
If the representation isn't a fully generic 3D SparseArray, and the main use case is batched operations, why not a hypothetical |
Sure, that makes sense to me. Let me know if I'm understanding you correctly with this below, mutable struct Batched{<: CuSparseMatrixCSR, Tv, Ti} <: AbstractCuSparseArray{Tv, Ti, 3}
rowPtr::CuMatrix{Ti}
colVal::CuMatrix{Ti}
nzVal::CuMatrix{Tv}
dims::NTuple{3,Int}
nnz::Ti
function Batched{A, Tv, Ti}(rowPtr::CuMatrix{<:Integer}, colVal::CuMatrix{<:Integer}, nzVal::CuMatrix, dims::NTuple{3,<:Integer}) where {Tv, Ti<:Integer}
new{A, Tv, Ti}(rowPtr, colVal, nzVal, dims, length(nzVal))
end
end |
@nikopj Nice! |
Getting this going again. I've made things more general by allowing the "batch" dimension of the CuSparseArrayCSR{Tv,Ti,N} to be several dimensions (N-2 batch dims). I'm motiavted to do this by some deep-learning sparse attention stuff, where we might have different sparse attention matrices per mini-batch element, and each mini-batch element might want to make use of several separate attention matrices (ex. multi-head self attention). So N=3 is the same case as before, but N>3 is now also possible. I've made it such that batched sparse-dense matmul also works for N>3 if the sizes make sense. See the end of Because of the extended dimensions, I'm in favor of keeping the original naming convention. It also makes sense to keep things specific to the sparsity type as that dictates the fields of the struct. If we end up implementing a similar type for CSC, COO, etc, we could do a "Batched" union type for all of them. The printing / showing / indexing of CuSparseArrayCSR is also working better, making testing on the REPL easier.
|
CI failures look related. |
It's failing on CUDA 11.4 and 11.5, + the julia nightly build (unrelated to cuSparse). A quick fix would be to only define |
At the very least, yes. You could make the tests check for the cuSPARSE version (see the start of the CI logs), and maybe even add an error to the function itself. The nightly issue is unrelated indeed. |
According to Aqua, this added a couple of ambiguities. Not sure why CI didn't spot those...
Could you fix those? |
Ok I'll take a look. |
I've implemented a small working example of a 3-dimensional sparse array, CuSparseArrayCSR, which can be thought of as multiple CuSparseMatrixCSR stacked into a 3rd (batch) dimension. The only restriction I'm aware of is that the number of non-zeros of each matrix slices (batch-element) must be the same. The benefit of this representation is that we can more easily make use of CUDA's batched sparse mat muls, ex. Ci = Ai * Bi, which i've implemented for sparse * dense batched matmul in
lib/cusparse/generic.jl
(seebmm!
). Example uses are intest/cusparse/bmm.jl
.I followed the cuSPARSE docs and the nvidia sample code for batched spmm. Based on the docs, I think this can be extended to CSC and COO representations, as well as different mat-mul cases. First, I'd like to get some feedback to see if this implementation is sensible.
This would be helpful for some neural network training cases. Similar capabilities are available in Pytorch, though I think they're implementation is more restrictive by not allowing different sparsity patterns between batch-elements.
Thanks,
Nikola