Use unified memory in CUDA debug builds #621

upsj · 2020-08-18T21:50:03Z

This PR uses unified memory instead of device memory in CUDA debug builds, allowing you to directly access device memory in cuda-gdb.

It additionally fixes a bug in multi-GPU raw_copy_to, which used the wrong device ordinals due to a wrongly named parameter (src -> dest)

codecov · 2020-08-19T01:46:20Z

Codecov Report

Merging #621 into develop will decrease coverage by 0.00%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop     #621      +/-   ##
===========================================
- Coverage    92.86%   92.86%   -0.01%     
===========================================
  Files          303      303              
  Lines        21115    21115              
===========================================
- Hits         19609    19608       -1     
- Misses        1506     1507       +1

Impacted Files	Coverage Δ
core/base/extended_float.hpp	`91.26% <0.00%> (-0.98%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6dd2eb4...15faee3. Read the comment docs.

pratikvn · 2020-08-19T09:00:06Z

I definitely like the idea of using UVM in debug builds, but I am not so sure if we should be implementing two different implementations for Release and Debug. To me debug builds are meant to have the same implementation but maybe with more possible checks and information, but in this case the implementation (std::memcpy vs CudaMemcpyPeer) is different which kind of violates that principle.

Nice job finding that bug for the multi-GPU raw_copy_to. I guess that was the issue with my CudaUVM memspace invalid device ordinal errors.

Also you dont seem to be doing this for HIP device -->HIP device raw_copy_to.

upsj · 2020-08-19T09:06:57Z

That's a good point, it might also be that you can just use cudaMemcpy as usual with UVM, I will check.

pratikvn

LGTM!

yhmtsai

LGTM

dev_tools/scripts/gdb-ginkgo.py

also fix the incorrect peer memcpy device ID order

thoasm

LGTM!

sonarcloud · 2020-08-20T20:53:49Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities (and 0 Security Hotspots to review)
0 Code Smells

0.0% Coverage
0.0% Duplication

The version of Java (1.8.0_121) you have used to run this analysis is deprecated and we will stop accepting it from October 2020. Please update to at least Java 11.
Read more here

Release 1.3.0 of Ginkgo. The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.3.0. This release brings CUDA 11 support, changes the default C++ standard to be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for diagonal extraction, significantly improves the CMake configuration output format, adds the Ginkgo paper which got accepted into the Journal of Open Source Software (JOSS), and fixes multiple issues. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 2.8+ + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). Additions: + Add paper for Journal of Open Source Software (JOSS). [#479](#479) + Add a DiagonalExtractable interface. [#563](#563) + Add a new diagonal Matrix Format. [#580](#580) + Add Cuda11 support. [#603](#603) + Add information output after CMake configuration. [#610](#610) + Add a new preconditioner export example. [#595](#595) + Add a new cuda-memcheck CI job. [#592](#592) Changes: + Use unified memory in CUDA debug builds. [#621](#621) + Improve `BENCHMARKING.md` with more detailed info. [#619](#619) + Use C++14 standard instead of C++11. [#611](#611) + Update the Ampere sm information and CudaArchitectureSelector. [#588](#588) Fixes: + Fix documentation warnings and errors. [#624](#624) + Fix warnings for diagonal matrix format. [#622](#622) + Fix criterion factory parameters in CUDA. [#586](#586) + Fix the norm-type in the examples. [#612](#612) + Fix the WAW race in OpenMP is_sorted_by_column_index. [#617](#617) + Fix the example's exec_map by creating the executor only if requested. [#602](#602) + Fix some CMake warnings. [#614](#614) + Fix Windows building documentation. [#601](#601) + Warn when CXX and CUDA host compiler do not match. [#607](#607) + Fix reduce_add, prefix_sum, and doc-build. [#593](#593) + Fix find_library(cublas) issue on machines installing multiple cuda. [#591](#591) + Fix allocator in sellp read. [#589](#589) + Fix the CAS with HIP and NVIDIA backends. [#585](#585) Deletions: + Remove unused preconditioner parameter in LowerTrs. [#587](#587) Related PR: #625

The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.3.0. This release brings CUDA 11 support, changes the default C++ standard to be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for diagonal extraction, significantly improves the CMake configuration output format, adds the Ginkgo paper which got accepted into the Journal of Open Source Software (JOSS), and fixes multiple issues. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 2.8+ + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). Additions: + Add paper for Journal of Open Source Software (JOSS). [#479](#479) + Add a DiagonalExtractable interface. [#563](#563) + Add a new diagonal Matrix Format. [#580](#580) + Add Cuda11 support. [#603](#603) + Add information output after CMake configuration. [#610](#610) + Add a new preconditioner export example. [#595](#595) + Add a new cuda-memcheck CI job. [#592](#592) Changes: + Use unified memory in CUDA debug builds. [#621](#621) + Improve `BENCHMARKING.md` with more detailed info. [#619](#619) + Use C++14 standard instead of C++11. [#611](#611) + Update the Ampere sm information and CudaArchitectureSelector. [#588](#588) Fixes: + Fix documentation warnings and errors. [#624](#624) + Fix warnings for diagonal matrix format. [#622](#622) + Fix criterion factory parameters in CUDA. [#586](#586) + Fix the norm-type in the examples. [#612](#612) + Fix the WAW race in OpenMP is_sorted_by_column_index. [#617](#617) + Fix the example's exec_map by creating the executor only if requested. [#602](#602) + Fix some CMake warnings. [#614](#614) + Fix Windows building documentation. [#601](#601) + Warn when CXX and CUDA host compiler do not match. [#607](#607) + Fix reduce_add, prefix_sum, and doc-build. [#593](#593) + Fix find_library(cublas) issue on machines installing multiple cuda. [#591](#591) + Fix allocator in sellp read. [#589](#589) + Fix the CAS with HIP and NVIDIA backends. [#585](#585) Deletions: + Remove unused preconditioner parameter in LowerTrs. [#587](#587) Related PR: #627

upsj added is:bug Something looks wrong. is:help-wanted Need ideas on how to solve this. mod:cuda This is related to the CUDA module. 1:ST:ready-for-review This PR is ready for review labels Aug 18, 2020

upsj requested review from pratikvn, thoasm, yhmtsai, tcojean and fritzgoebel August 18, 2020 21:50

upsj self-assigned this Aug 18, 2020

upsj changed the title ~~Use unified memory on CUDA debug builds~~ Fix GPU raw_copy_to between devices Aug 18, 2020

upsj force-pushed the executor_debug_uvm branch from e1d1f8f to 6d90ba0 Compare August 18, 2020 22:35

upsj changed the title ~~Fix GPU raw_copy_to between devices~~ Use unified memory in CUDA debug builds Aug 19, 2020

upsj force-pushed the executor_debug_uvm branch 2 times, most recently from 2709bba to 50a6db6 Compare August 19, 2020 09:34

upsj removed the is:help-wanted Need ideas on how to solve this. label Aug 19, 2020

pratikvn approved these changes Aug 20, 2020

View reviewed changes

yhmtsai approved these changes Aug 20, 2020

View reviewed changes

dev_tools/scripts/gdb-ginkgo.py Outdated Show resolved Hide resolved

upsj added 2 commits August 20, 2020 15:50

use UVM on CUDA debug builds

0684204

also fix the incorrect peer memcpy device ID order

disable unified memory on AMD GPUs

15faee3

upsj force-pushed the executor_debug_uvm branch from aef52b1 to 15faee3 Compare August 20, 2020 13:52

thoasm approved these changes Aug 20, 2020

View reviewed changes

upsj added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Aug 20, 2020

upsj merged commit 4a45c3f into develop Aug 20, 2020

upsj deleted the executor_debug_uvm branch August 20, 2020 21:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use unified memory in CUDA debug builds #621

Use unified memory in CUDA debug builds #621

upsj commented Aug 18, 2020 •

edited

Loading

codecov bot commented Aug 19, 2020 •

edited

Loading

pratikvn commented Aug 19, 2020

upsj commented Aug 19, 2020

pratikvn left a comment

yhmtsai left a comment

thoasm left a comment

sonarcloud bot commented Aug 20, 2020

Use unified memory in CUDA debug builds #621

Use unified memory in CUDA debug builds #621

Conversation

upsj commented Aug 18, 2020 • edited Loading

codecov bot commented Aug 19, 2020 • edited Loading

Codecov Report

pratikvn commented Aug 19, 2020

upsj commented Aug 19, 2020

pratikvn left a comment

Choose a reason for hiding this comment

yhmtsai left a comment

Choose a reason for hiding this comment

thoasm left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Aug 20, 2020

upsj commented Aug 18, 2020 •

edited

Loading

codecov bot commented Aug 19, 2020 •

edited

Loading