Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda11 support #603

Merged
merged 16 commits into from
Aug 10, 2020
Merged

Cuda11 support #603

merged 16 commits into from
Aug 10, 2020

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Jul 30, 2020

this PR add cuda11 support.

Summary:

  1. add generic spmv/spmm into cusparse spmv/spmm
  2. use cusp_coo/cusp_csr as cusp_gcoo/cusp_gcsr in benchmark/spmv
  3. use correct architecture flag in cuda/test
  4. use cuda11's cooperative group because it support complex shuffle, but we remove the parent type in tiled_partition
  5. add cuda 10.2/11 gitlab ci (linux) and cuda11 github action (windows)
  6. replace deprecated spgemm2 and gthr by new cusparse generic interface.

Fixes #613

@yhmtsai yhmtsai added mod:cuda This is related to the CUDA module. 1:ST:WIP This PR is a work in progress. Not ready for review. labels Jul 30, 2020
@yhmtsai yhmtsai self-assigned this Jul 30, 2020
@adam-m-jcbs adam-m-jcbs mentioned this pull request Aug 4, 2020
16 tasks
@upsj upsj added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Aug 5, 2020
@codecov
Copy link

codecov bot commented Aug 5, 2020

Codecov Report

❗ No coverage uploaded for pull request base (develop@544f7ef). Click here to learn what that means.
The diff coverage is 60.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop     #603   +/-   ##
==========================================
  Coverage           ?   93.01%           
==========================================
  Files              ?      296           
  Lines              ?    20660           
  Branches           ?        0           
==========================================
  Hits               ?    19216           
  Misses             ?     1444           
  Partials           ?        0           
Impacted Files Coverage Δ
include/ginkgo/core/base/types.hpp 92.59% <ø> (ø)
omp/test/matrix/csr_kernels.cpp 100.00% <ø> (ø)
reference/test/stop/combined.cpp 100.00% <ø> (ø)
reference/test/stop/time.cpp 100.00% <ø> (ø)
include/ginkgo/core/matrix/csr.hpp 47.72% <60.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 544f7ef...4c7c474. Read the comment docs.

@tcojean tcojean added 1:ST:WIP This PR is a work in progress. Not ready for review. and removed 1:ST:ready-for-review This PR is ready for review labels Aug 5, 2020
@yhmtsai yhmtsai force-pushed the cuda11 branch 6 times, most recently from 3b387f5 to b6c54b7 Compare August 6, 2020 23:43
@yhmtsai yhmtsai added 1:ST:ready-for-review This PR is ready for review reg:build This is related to the build system. reg:ci-cd This is related to the continuous integration system. and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Aug 7, 2020
Copy link
Member

@tcojean tcojean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments.

.gitlab-ci.yml Outdated Show resolved Hide resolved
cuda/base/cusparse_bindings.hpp Outdated Show resolved Hide resolved
cuda/base/cusparse_bindings.hpp Outdated Show resolved Hide resolved
cuda/components/cooperative_groups.cuh Outdated Show resolved Hide resolved
cuda/components/cooperative_groups.cuh Outdated Show resolved Hide resolved
cuda/matrix/csr_kernels.cu Show resolved Hide resolved
Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, only one important issue and a few nits

.gitlab-ci.yml Outdated Show resolved Hide resolved
.gitlab-ci.yml Outdated Show resolved Hide resolved
.gitlab-ci.yml Outdated Show resolved Hide resolved
.gitlab-ci.yml Outdated Show resolved Hide resolved
benchmark/utils/cuda_linops.hpp Outdated Show resolved Hide resolved
cuda/matrix/csr_kernels.cu Outdated Show resolved Hide resolved
cuda/matrix/csr_kernels.cu Show resolved Hide resolved
@yhmtsai
Copy link
Member Author

yhmtsai commented Aug 10, 2020

I also add the comments on #endif when the block only contains #if-#endif.

#if condition
...
...
#endif // condition

Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

cuda/base/cusparse_bindings.hpp Show resolved Hide resolved
cuda/base/cusparse_handle.hpp Show resolved Hide resolved
Comment on lines -871 to +1056
trans->get_col_idxs(), trans->get_row_ptrs(), copyValues, idxBase);
trans->get_row_ptrs(), trans->get_col_idxs(), copyValues, idxBase);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also change the ordering here

Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Just a small nit.

cuda/matrix/csr_kernels.cu Outdated Show resolved Hide resolved
* add #endif comment if #else/#elif is not in block
* use size_type

Co-authored-by: Terry Cojean <[email protected]>
Co-authored-by: Tobias Ribizel <[email protected]>
Co-authored-by: Thomas Grützmacher <[email protected]>
@yhmtsai yhmtsai added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Aug 10, 2020
@sonarcloud
Copy link

sonarcloud bot commented Aug 10, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
9.4% 9.4% Duplication

warning The version of Java (1.8.0_121) you have used to run this analysis is deprecated and we will stop accepting it from October 2020. Please update to at least Java 11.
Read more here

@yhmtsai yhmtsai merged commit bd043ef into develop Aug 10, 2020
@yhmtsai yhmtsai deleted the cuda11 branch August 10, 2020 21:53
tcojean added a commit that referenced this pull request Aug 26, 2020
Release 1.3.0 of Ginkgo.

The Ginkgo team is proud to announce the new minor release of Ginkgo version
1.3.0. This release brings CUDA 11 support, changes the default C++ standard to
be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for
diagonal extraction, significantly improves the CMake configuration output
format, adds the Ginkgo paper which got accepted into the Journal of Open Source
Software (JOSS), and fixes multiple issues.

Supported systems and requirements:
+ For all platforms, cmake 3.9+
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + clang: 3.9+
  + Intel compiler: 2017+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
  + HIP module: ROCm 2.8+
+ Windows
  + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + Microsoft Visual Studio: VS 2017 15.7+
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues).


Additions:
+ Add paper for Journal of Open Source Software (JOSS). [#479](#479)
+ Add a DiagonalExtractable interface. [#563](#563)
+ Add a new diagonal Matrix Format. [#580](#580)
+ Add Cuda11 support. [#603](#603)
+ Add information output after CMake configuration. [#610](#610)
+ Add a new preconditioner export example. [#595](#595)
+ Add a new cuda-memcheck CI job. [#592](#592)

Changes:
+ Use unified memory in CUDA debug builds. [#621](#621)
+ Improve `BENCHMARKING.md` with more detailed info. [#619](#619)
+ Use C++14 standard instead of C++11. [#611](#611)
+ Update the Ampere sm information and CudaArchitectureSelector. [#588](#588)

Fixes:
+ Fix documentation warnings and errors. [#624](#624)
+ Fix warnings for diagonal matrix format. [#622](#622)
+ Fix criterion factory parameters in CUDA. [#586](#586)
+ Fix the norm-type in the examples. [#612](#612)
+ Fix the WAW race in OpenMP is_sorted_by_column_index. [#617](#617)
+ Fix the example's exec_map by creating the executor only if requested. [#602](#602)
+ Fix some CMake warnings. [#614](#614)
+ Fix Windows building documentation. [#601](#601)
+ Warn when CXX and CUDA host compiler do not match. [#607](#607)
+ Fix reduce_add, prefix_sum, and doc-build. [#593](#593)
+ Fix find_library(cublas) issue on machines installing multiple cuda. [#591](#591)
+ Fix allocator in sellp read. [#589](#589)
+ Fix the CAS with HIP and NVIDIA backends. [#585](#585)

Deletions:
+ Remove unused preconditioner parameter in LowerTrs. [#587](#587)

Related PR: #625
tcojean added a commit that referenced this pull request Aug 27, 2020
The Ginkgo team is proud to announce the new minor release of Ginkgo version
1.3.0. This release brings CUDA 11 support, changes the default C++ standard to
be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for
diagonal extraction, significantly improves the CMake configuration output
format, adds the Ginkgo paper which got accepted into the Journal of Open Source
Software (JOSS), and fixes multiple issues.

Supported systems and requirements:
+ For all platforms, cmake 3.9+
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + clang: 3.9+
  + Intel compiler: 2017+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
  + HIP module: ROCm 2.8+
+ Windows
  + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + Microsoft Visual Studio: VS 2017 15.7+
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues).


Additions:
+ Add paper for Journal of Open Source Software (JOSS). [#479](#479)
+ Add a DiagonalExtractable interface. [#563](#563)
+ Add a new diagonal Matrix Format. [#580](#580)
+ Add Cuda11 support. [#603](#603)
+ Add information output after CMake configuration. [#610](#610)
+ Add a new preconditioner export example. [#595](#595)
+ Add a new cuda-memcheck CI job. [#592](#592)

Changes:
+ Use unified memory in CUDA debug builds. [#621](#621)
+ Improve `BENCHMARKING.md` with more detailed info. [#619](#619)
+ Use C++14 standard instead of C++11. [#611](#611)
+ Update the Ampere sm information and CudaArchitectureSelector. [#588](#588)

Fixes:
+ Fix documentation warnings and errors. [#624](#624)
+ Fix warnings for diagonal matrix format. [#622](#622)
+ Fix criterion factory parameters in CUDA. [#586](#586)
+ Fix the norm-type in the examples. [#612](#612)
+ Fix the WAW race in OpenMP is_sorted_by_column_index. [#617](#617)
+ Fix the example's exec_map by creating the executor only if requested. [#602](#602)
+ Fix some CMake warnings. [#614](#614)
+ Fix Windows building documentation. [#601](#601)
+ Warn when CXX and CUDA host compiler do not match. [#607](#607)
+ Fix reduce_add, prefix_sum, and doc-build. [#593](#593)
+ Fix find_library(cublas) issue on machines installing multiple cuda. [#591](#591)
+ Fix allocator in sellp read. [#589](#589)
+ Fix the CAS with HIP and NVIDIA backends. [#585](#585)

Deletions:
+ Remove unused preconditioner parameter in LowerTrs. [#587](#587)

Related PR: #627
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. mod:cuda This is related to the CUDA module. reg:build This is related to the build system. reg:ci-cd This is related to the continuous integration system.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[JOSS REVIEW] Use of deprecated function cusparseScsrmv in CUDA executor
5 participants