Cuda11 support #603

yhmtsai · 2020-07-30T09:10:33Z

this PR add cuda11 support.

Summary:

add generic spmv/spmm into cusparse spmv/spmm
use cusp_coo/cusp_csr as cusp_gcoo/cusp_gcsr in benchmark/spmv
use correct architecture flag in cuda/test
use cuda11's cooperative group because it support complex shuffle, but we remove the parent type in tiled_partition
add cuda 10.2/11 gitlab ci (linux) and cuda11 github action (windows)
replace deprecated spgemm2 and gthr by new cusparse generic interface.

Fixes #613

cuda/base/cusparse_bindings.hpp

codecov · 2020-08-05T14:17:18Z

Codecov Report

❗ No coverage uploaded for pull request base (develop@544f7ef). Click here to learn what that means.
The diff coverage is 60.00%.

@@            Coverage Diff             @@
##             develop     #603   +/-   ##
==========================================
  Coverage           ?   93.01%           
==========================================
  Files              ?      296           
  Lines              ?    20660           
  Branches           ?        0           
==========================================
  Hits               ?    19216           
  Misses             ?     1444           
  Partials           ?        0

Impacted Files	Coverage Δ
include/ginkgo/core/base/types.hpp	`92.59% <ø> (ø)`
omp/test/matrix/csr_kernels.cpp	`100.00% <ø> (ø)`
reference/test/stop/combined.cpp	`100.00% <ø> (ø)`
reference/test/stop/time.cpp	`100.00% <ø> (ø)`
include/ginkgo/core/matrix/csr.hpp	`47.72% <60.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 544f7ef...4c7c474. Read the comment docs.

tcojean

Some small comments.

.gitlab-ci.yml

cuda/base/cusparse_bindings.hpp

cuda/components/cooperative_groups.cuh

cuda/matrix/csr_kernels.cu

upsj

LGTM in general, only one important issue and a few nits

.gitlab-ci.yml

benchmark/utils/cuda_linops.hpp

cuda/matrix/csr_kernels.cu

Additionally, this commit * templatizes the create_* functions to avoid the need for cuda_data_type everywhere * fixes some issues with cusparse_bindings.hpp compiled in C++ code

cuSPARSE Generic can only do C = a*A*B, not C=a*A*B + b*C (this was confirmed by NVIDIA in a documentation bug report)

* otherwise, windows cmake still gives CMP0104 warning * generic api is available from cuda10.1 (expect for win), cuda11 (all)

yhmtsai · 2020-08-10T09:43:25Z

I also add the comments on #endif when the block only contains #if-#endif.

#if condition
...
...
#endif // condition

pratikvn

LGTM!

cuda/base/cusparse_bindings.hpp

cuda/base/cusparse_handle.hpp

yhmtsai · 2020-08-10T12:40:47Z

cuda/matrix/csr_kernels.cu

- trans->get_col_idxs(), trans->get_row_ptrs(), copyValues, idxBase);
+ trans->get_row_ptrs(), trans->get_col_idxs(), copyValues, idxBase);


I also change the ordering here

thoasm

LGTM!
Just a small nit.

cuda/matrix/csr_kernels.cu

* add #endif comment if #else/#elif is not in block * use size_type Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Tobias Ribizel <[email protected]> Co-authored-by: Thomas Grützmacher <[email protected]>

sonarcloud · 2020-08-10T21:48:41Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities (and 0 Security Hotspots to review)
0 Code Smells

0.0% Coverage
9.4% Duplication

The version of Java (1.8.0_121) you have used to run this analysis is deprecated and we will stop accepting it from October 2020. Please update to at least Java 11.
Read more here

Release 1.3.0 of Ginkgo. The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.3.0. This release brings CUDA 11 support, changes the default C++ standard to be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for diagonal extraction, significantly improves the CMake configuration output format, adds the Ginkgo paper which got accepted into the Journal of Open Source Software (JOSS), and fixes multiple issues. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 2.8+ + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). Additions: + Add paper for Journal of Open Source Software (JOSS). [#479](#479) + Add a DiagonalExtractable interface. [#563](#563) + Add a new diagonal Matrix Format. [#580](#580) + Add Cuda11 support. [#603](#603) + Add information output after CMake configuration. [#610](#610) + Add a new preconditioner export example. [#595](#595) + Add a new cuda-memcheck CI job. [#592](#592) Changes: + Use unified memory in CUDA debug builds. [#621](#621) + Improve `BENCHMARKING.md` with more detailed info. [#619](#619) + Use C++14 standard instead of C++11. [#611](#611) + Update the Ampere sm information and CudaArchitectureSelector. [#588](#588) Fixes: + Fix documentation warnings and errors. [#624](#624) + Fix warnings for diagonal matrix format. [#622](#622) + Fix criterion factory parameters in CUDA. [#586](#586) + Fix the norm-type in the examples. [#612](#612) + Fix the WAW race in OpenMP is_sorted_by_column_index. [#617](#617) + Fix the example's exec_map by creating the executor only if requested. [#602](#602) + Fix some CMake warnings. [#614](#614) + Fix Windows building documentation. [#601](#601) + Warn when CXX and CUDA host compiler do not match. [#607](#607) + Fix reduce_add, prefix_sum, and doc-build. [#593](#593) + Fix find_library(cublas) issue on machines installing multiple cuda. [#591](#591) + Fix allocator in sellp read. [#589](#589) + Fix the CAS with HIP and NVIDIA backends. [#585](#585) Deletions: + Remove unused preconditioner parameter in LowerTrs. [#587](#587) Related PR: #625

The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.3.0. This release brings CUDA 11 support, changes the default C++ standard to be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for diagonal extraction, significantly improves the CMake configuration output format, adds the Ginkgo paper which got accepted into the Journal of Open Source Software (JOSS), and fixes multiple issues. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 2.8+ + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). Additions: + Add paper for Journal of Open Source Software (JOSS). [#479](#479) + Add a DiagonalExtractable interface. [#563](#563) + Add a new diagonal Matrix Format. [#580](#580) + Add Cuda11 support. [#603](#603) + Add information output after CMake configuration. [#610](#610) + Add a new preconditioner export example. [#595](#595) + Add a new cuda-memcheck CI job. [#592](#592) Changes: + Use unified memory in CUDA debug builds. [#621](#621) + Improve `BENCHMARKING.md` with more detailed info. [#619](#619) + Use C++14 standard instead of C++11. [#611](#611) + Update the Ampere sm information and CudaArchitectureSelector. [#588](#588) Fixes: + Fix documentation warnings and errors. [#624](#624) + Fix warnings for diagonal matrix format. [#622](#622) + Fix criterion factory parameters in CUDA. [#586](#586) + Fix the norm-type in the examples. [#612](#612) + Fix the WAW race in OpenMP is_sorted_by_column_index. [#617](#617) + Fix the example's exec_map by creating the executor only if requested. [#602](#602) + Fix some CMake warnings. [#614](#614) + Fix Windows building documentation. [#601](#601) + Warn when CXX and CUDA host compiler do not match. [#607](#607) + Fix reduce_add, prefix_sum, and doc-build. [#593](#593) + Fix find_library(cublas) issue on machines installing multiple cuda. [#591](#591) + Fix allocator in sellp read. [#589](#589) + Fix the CAS with HIP and NVIDIA backends. [#585](#585) Deletions: + Remove unused preconditioner parameter in LowerTrs. [#587](#587) Related PR: #627

yhmtsai added mod:cuda This is related to the CUDA module. 1:ST:WIP This PR is a work in progress. Not ready for review. labels Jul 30, 2020

yhmtsai self-assigned this Jul 30, 2020

upsj reviewed Jul 30, 2020

View reviewed changes

cuda/base/cusparse_bindings.hpp Outdated Show resolved Hide resolved

upsj force-pushed the cuda11 branch from c7c49b9 to 9179a04 Compare August 4, 2020 12:53

thoasm mentioned this pull request Aug 4, 2020

[JOSS REVIEW] Use of deprecated function cusparseScsrmv in CUDA executor #613

Closed

adam-m-jcbs mentioned this pull request Aug 4, 2020

[JOSS REVIEW] Functionality Review #597

Closed

16 tasks

upsj added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Aug 5, 2020

upsj requested review from tcojean, thoasm, fritzgoebel and pratikvn August 5, 2020 11:10

tcojean added 1:ST:WIP This PR is a work in progress. Not ready for review. and removed 1:ST:ready-for-review This PR is ready for review labels Aug 5, 2020

yhmtsai force-pushed the cuda11 branch 6 times, most recently from 3b387f5 to b6c54b7 Compare August 6, 2020 23:43

yhmtsai added 1:ST:ready-for-review This PR is ready for review reg:build This is related to the build system. reg:ci-cd This is related to the continuous integration system. and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Aug 7, 2020

yhmtsai force-pushed the cuda11 branch from b6c54b7 to 880e1d1 Compare August 7, 2020 08:36

tcojean approved these changes Aug 7, 2020

View reviewed changes

upsj requested changes Aug 7, 2020

View reviewed changes

upsj approved these changes Aug 10, 2020

View reviewed changes

yhmtsai and others added 14 commits August 10, 2020 11:39

csr related function

eaeba21

fix spmv and the cuda test arch flag

e1fdce7

fix windows CUDA_VERISON and previous transpose

fe104d5

add alias cusp_coo/csr is generic from CUDA 11

323cf24

add native implementation and update csr parameters

acbeb01

use native thread_block_tile

0aa48d3

move csrsort gather to new cusparse interface

54396b1

Additionally, this commit * templatizes the create_* functions to avoid the need for cuda_data_type everywhere * fixes some issues with cusparse_bindings.hpp compiled in C++ code

add support for cusparse generic SpGEMM

ecb25df

workaround for missing cuSPARSE advanced SpGEMM

40fa364

cuSPARSE Generic can only do C = a*A*B, not C=a*A*B + b*C (this was confirmed by NVIDIA in a documentation bug report)

remove parent group in cooperative group in cuda11

c2809f1

fix benchmark doc, don't use csrgemm2info cuda11

b07c6d7

add cuda10.2 and cuda11 pipeline

339647d

add latest windows cuda (11) in github action

afc90d9

move CMP0104 to the top, fix issue in win cuda11

8a235d9

* otherwise, windows cmake still gives CMP0104 warning * generic api is available from cuda10.1 (expect for win), cuda11 (all)

yhmtsai force-pushed the cuda11 branch from 6b27272 to 955a432 Compare August 10, 2020 09:39

pratikvn approved these changes Aug 10, 2020

View reviewed changes

cuda/base/cusparse_bindings.hpp Show resolved Hide resolved

cuda/base/cusparse_handle.hpp Show resolved Hide resolved

yhmtsai commented Aug 10, 2020

View reviewed changes

thoasm approved these changes Aug 10, 2020

View reviewed changes

cuda/matrix/csr_kernels.cu Outdated Show resolved Hide resolved

yhmtsai force-pushed the cuda11 branch from 955a432 to 0019bfa Compare August 10, 2020 14:25

review update

1f54d6c

* add #endif comment if #else/#elif is not in block * use size_type Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Tobias Ribizel <[email protected]> Co-authored-by: Thomas Grützmacher <[email protected]>

yhmtsai force-pushed the cuda11 branch from 0019bfa to 1f54d6c Compare August 10, 2020 14:38

yhmtsai added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Aug 10, 2020

keep the same interface of transpose in hip

4c7c474

yhmtsai merged commit bd043ef into develop Aug 10, 2020

yhmtsai deleted the cuda11 branch August 10, 2020 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda11 support #603

Cuda11 support #603

yhmtsai commented Jul 30, 2020 •

edited by upsj

Loading

codecov bot commented Aug 5, 2020 •

edited

Loading

tcojean left a comment

upsj left a comment

yhmtsai commented Aug 10, 2020 •

edited

Loading

pratikvn left a comment

yhmtsai Aug 10, 2020

thoasm left a comment

sonarcloud bot commented Aug 10, 2020

		trans->get_col_idxs(), trans->get_row_ptrs(), copyValues, idxBase);
		trans->get_row_ptrs(), trans->get_col_idxs(), copyValues, idxBase);

Cuda11 support #603

Cuda11 support #603

Conversation

yhmtsai commented Jul 30, 2020 • edited by upsj Loading

codecov bot commented Aug 5, 2020 • edited Loading

Codecov Report

tcojean left a comment

Choose a reason for hiding this comment

upsj left a comment

Choose a reason for hiding this comment

yhmtsai commented Aug 10, 2020 • edited Loading

pratikvn left a comment

Choose a reason for hiding this comment

yhmtsai Aug 10, 2020

Choose a reason for hiding this comment

thoasm left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Aug 10, 2020

yhmtsai commented Jul 30, 2020 •

edited by upsj

Loading

codecov bot commented Aug 5, 2020 •

edited

Loading

yhmtsai commented Aug 10, 2020 •

edited

Loading