Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extract_diagonal for all matrix formats #563

Merged
merged 17 commits into from
Aug 24, 2020
Merged

Conversation

fritzgoebel
Copy link
Collaborator

@fritzgoebel fritzgoebel commented Jun 9, 2020

This PR introduces extraction of the diagonal (which is needed by openCARP) into a vector for all matrix formats.

TODO

@fritzgoebel fritzgoebel added is:new-feature A request or implementation of a feature that does not exist yet. mod:core This is related to the core module. mod:cuda This is related to the CUDA module. mod:openmp This is related to the OpenMP module. mod:reference This is related to the reference module. 1:ST:ready-for-review This PR is ready for review mod:hip This is related to the HIP module. labels Jun 9, 2020
@fritzgoebel fritzgoebel self-assigned this Jun 9, 2020
Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I mainly have a few nits and a request: For robustness, could you test with dense stride > 1 instead? This way, we can see that all the stride computations are correct.
Also for your CUDA/HIP code, you always write if (idx < ...) { ... }. It can be simpler to use if (idx >= ...) { return; } instead since this doesn't need indentation and makes sure that the kernel exits and no code is executed afterwards.

const auto orig_idx = i + orig_row_ptrs[row];
if (orig_idx < nnz) {
if (orig_col_idxs[orig_idx] == row) {
diag[row * diag_stride] = orig_values[orig_idx];
Copy link
Member

@upsj upsj Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can return early here

Suggested change
diag[row * diag_stride] = orig_values[orig_idx];
diag[row * diag_stride] = orig_values[orig_idx];
return;

(same with all similar kernels)

template <typename ValueType, typename IndexType>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you just use the COO/ELL kernels here?

const auto tid_in_warp = warp_tile.thread_rank();

for (size_type sellp_ind =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
for (size_type sellp_ind =
for (auto sellp_ind =

Maybe you could also extract origin_slice_sets[slice_id(+1)] or the complete loop bounds into variables to simplify the loop header.

if (global_row < diag_size) {
if (orig_col_idxs[sellp_ind] == global_row &&
orig_values[sellp_ind] != zero<ValueType>()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency: Maybe compare to zero in both ELL/SELLP? Or leave the comparison out in both?

@@ -664,4 +664,21 @@ TEST_F(Csr, SortUnsortedMatrixIsEquivalentToRef)
}


TEST_F(Csr, ExtractDiagonalIsquivalentToRef)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TEST_F(Csr, ExtractDiagonalIsquivalentToRef)
TEST_F(Csr, ExtractDiagonalIsEquivalentToRef)

*
* @param diag the vector into which the diagonal will be written
*/
void extract_diagonal(Dense<value_type> *diag) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void extract_diagonal(Dense<value_type> *diag) const;
void extract_diagonal(Dense<ValueType> *diag) const;

Comment on lines 238 to 242
value_type val_at(size_type row, size_type slice_set,
size_type idx) const noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, clang-format did this. Should I leave it or change it back?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also use clang-format 8?

#pragma omp parallel for
for (size_type slice = 0; slice < slice_num; slice++) {
for (size_type row = 0;
row < slice_size && slice_size * slice + row < diag_size; row++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a rather complicated loop condition. Can it be simplified, maybe?

@@ -222,4 +222,21 @@ TEST_F(Coo, ConvertToCsrIsEquivalentToRef)
}


TEST_F(Coo, ExtractDiagonalIsquivalentToRef)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TEST_F(Coo, ExtractDiagonalIsquivalentToRef)
TEST_F(Coo, ExtractDiagonalIsEquivalentToRef)

Comment on lines 41 to 42
#include <iostream>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <iostream>

thoasm
thoasm previously requested changes Jun 10, 2020
Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts on the Mixin

include/ginkgo/core/matrix/coo.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/lin_op.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/lin_op.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/lin_op.hpp Outdated Show resolved Hide resolved
@fritzgoebel fritzgoebel force-pushed the add_get_diag branch 4 times, most recently from 3a31aa5 to e0608a6 Compare June 15, 2020 08:54
common/matrix/ell_kernels.hpp.inc Outdated Show resolved Hide resolved


#include "common/matrix/coo_kernels.hpp.inc"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -51,6 +51,11 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

namespace gko {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines 54 to 60
namespace matrix {
template <typename ValueType>
class Dense;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace matrix {
template <typename ValueType>
class Dense;
}
namespace matrix {
template <typename ValueType>
class Dense;
}

reference/matrix/ell_kernels.cpp Show resolved Hide resolved
GKO_ASSERT_MTX_NEAR(diag,
l({{1.}, {2.}, {1.2}}), r<TypeParam>::value);
// clang-format on
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you add a non-square matrix test

@thoasm thoasm dismissed their stale review June 16, 2020 12:37

Stale review

@thoasm thoasm self-requested a review June 16, 2020 12:37
@fritzgoebel fritzgoebel force-pushed the add_get_diag branch 2 times, most recently from 8b308b1 to dd5d432 Compare June 29, 2020 15:40
@codecov
Copy link

codecov bot commented Jun 29, 2020

Codecov Report

Merging #563 into develop will increase coverage by 0.02%.
The diff coverage is 95.37%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #563      +/-   ##
===========================================
+ Coverage    92.86%   92.88%   +0.02%     
===========================================
  Files          303      303              
  Lines        21115    21331     +216     
===========================================
+ Hits         19608    19814     +206     
- Misses        1507     1517      +10     
Impacted Files Coverage Δ
core/device_hooks/common_kernels.inc.cpp 0.00% <0.00%> (ø)
include/ginkgo/core/matrix/coo.hpp 94.73% <ø> (ø)
include/ginkgo/core/matrix/csr.hpp 47.72% <ø> (ø)
include/ginkgo/core/matrix/dense.hpp 97.70% <ø> (ø)
include/ginkgo/core/matrix/ell.hpp 100.00% <ø> (ø)
include/ginkgo/core/matrix/hybrid.hpp 94.11% <ø> (ø)
include/ginkgo/core/matrix/sellp.hpp 89.74% <ø> (ø)
core/matrix/coo.cpp 97.61% <100.00%> (+0.32%) ⬆️
core/matrix/csr.cpp 98.55% <100.00%> (+0.07%) ⬆️
core/matrix/dense.cpp 99.21% <100.00%> (+0.02%) ⬆️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a45c3f...5c64ad1. Read the comment docs.

Copy link
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

size_type problem_size, const ValueType *__restrict__ orig,
size_type stride_orig, ValueType *__restrict__ diag, size_type stride_diag)
{
constexpr auto warps_per_block = default_block_size / config::warp_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warps_per_block is unused

if (orig_col_idxs[sellp_ind] == global_row &&
orig_values[sellp_ind] != zero<ValueType>()) {
diag[global_row * diag_stride] = orig_values[sellp_ind];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can also return earlier

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that because the slices are stored column-major and there is one warp per slice, it can't return early here. Depending on the slice size, it can happen that one thread finds multiple diagonal entries.

@@ -64,6 +64,7 @@ namespace hybrid {

constexpr int default_block_size = 512;
constexpr int warps_in_block = 4;
constexpr int spmv_block_size = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not needed, right?

include/ginkgo/core/base/lin_op.hpp Show resolved Hide resolved
Copy link
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -262,6 +265,7 @@ class Dense : public EnableLinOp<Dense<ValueType>>,
return values_.get_const_data();
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@@ -242,6 +243,8 @@ class Dense : public EnableLinOp<Dense<ValueType>>,
std::unique_ptr<LinOp> inverse_column_permute(
const Array<int64> *inverse_permutation_indices) const override;

std::unique_ptr<Diagonal<ValueType>> extract_diagonal() const override;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines 238 to 242
value_type val_at(size_type row, size_type slice_set,
size_type idx) const noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also use clang-format 8?

reference/matrix/sellp_kernels.cpp Outdated Show resolved Hide resolved
omp/test/matrix/sellp_kernels.cpp Outdated Show resolved Hide resolved
reference/test/matrix/csr_kernels.cpp Outdated Show resolved Hide resolved
@sonarcloud
Copy link

sonarcloud bot commented Aug 21, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 22 Code Smells

63.5% 63.5% Coverage
4.5% 4.5% Duplication

warning The version of Java (1.8.0_121) you have used to run this analysis is deprecated and we will stop accepting it from October 2020. Please update to at least Java 11.
Read more here

@fritzgoebel fritzgoebel merged commit bc432c4 into develop Aug 24, 2020
@fritzgoebel fritzgoebel deleted the add_get_diag branch August 24, 2020 07:21
@tcojean tcojean added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Aug 24, 2020
tcojean added a commit that referenced this pull request Aug 26, 2020
Release 1.3.0 of Ginkgo.

The Ginkgo team is proud to announce the new minor release of Ginkgo version
1.3.0. This release brings CUDA 11 support, changes the default C++ standard to
be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for
diagonal extraction, significantly improves the CMake configuration output
format, adds the Ginkgo paper which got accepted into the Journal of Open Source
Software (JOSS), and fixes multiple issues.

Supported systems and requirements:
+ For all platforms, cmake 3.9+
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + clang: 3.9+
  + Intel compiler: 2017+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
  + HIP module: ROCm 2.8+
+ Windows
  + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + Microsoft Visual Studio: VS 2017 15.7+
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues).


Additions:
+ Add paper for Journal of Open Source Software (JOSS). [#479](#479)
+ Add a DiagonalExtractable interface. [#563](#563)
+ Add a new diagonal Matrix Format. [#580](#580)
+ Add Cuda11 support. [#603](#603)
+ Add information output after CMake configuration. [#610](#610)
+ Add a new preconditioner export example. [#595](#595)
+ Add a new cuda-memcheck CI job. [#592](#592)

Changes:
+ Use unified memory in CUDA debug builds. [#621](#621)
+ Improve `BENCHMARKING.md` with more detailed info. [#619](#619)
+ Use C++14 standard instead of C++11. [#611](#611)
+ Update the Ampere sm information and CudaArchitectureSelector. [#588](#588)

Fixes:
+ Fix documentation warnings and errors. [#624](#624)
+ Fix warnings for diagonal matrix format. [#622](#622)
+ Fix criterion factory parameters in CUDA. [#586](#586)
+ Fix the norm-type in the examples. [#612](#612)
+ Fix the WAW race in OpenMP is_sorted_by_column_index. [#617](#617)
+ Fix the example's exec_map by creating the executor only if requested. [#602](#602)
+ Fix some CMake warnings. [#614](#614)
+ Fix Windows building documentation. [#601](#601)
+ Warn when CXX and CUDA host compiler do not match. [#607](#607)
+ Fix reduce_add, prefix_sum, and doc-build. [#593](#593)
+ Fix find_library(cublas) issue on machines installing multiple cuda. [#591](#591)
+ Fix allocator in sellp read. [#589](#589)
+ Fix the CAS with HIP and NVIDIA backends. [#585](#585)

Deletions:
+ Remove unused preconditioner parameter in LowerTrs. [#587](#587)

Related PR: #625
tcojean added a commit that referenced this pull request Aug 27, 2020
The Ginkgo team is proud to announce the new minor release of Ginkgo version
1.3.0. This release brings CUDA 11 support, changes the default C++ standard to
be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for
diagonal extraction, significantly improves the CMake configuration output
format, adds the Ginkgo paper which got accepted into the Journal of Open Source
Software (JOSS), and fixes multiple issues.

Supported systems and requirements:
+ For all platforms, cmake 3.9+
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + clang: 3.9+
  + Intel compiler: 2017+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
  + HIP module: ROCm 2.8+
+ Windows
  + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + Microsoft Visual Studio: VS 2017 15.7+
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues).


Additions:
+ Add paper for Journal of Open Source Software (JOSS). [#479](#479)
+ Add a DiagonalExtractable interface. [#563](#563)
+ Add a new diagonal Matrix Format. [#580](#580)
+ Add Cuda11 support. [#603](#603)
+ Add information output after CMake configuration. [#610](#610)
+ Add a new preconditioner export example. [#595](#595)
+ Add a new cuda-memcheck CI job. [#592](#592)

Changes:
+ Use unified memory in CUDA debug builds. [#621](#621)
+ Improve `BENCHMARKING.md` with more detailed info. [#619](#619)
+ Use C++14 standard instead of C++11. [#611](#611)
+ Update the Ampere sm information and CudaArchitectureSelector. [#588](#588)

Fixes:
+ Fix documentation warnings and errors. [#624](#624)
+ Fix warnings for diagonal matrix format. [#622](#622)
+ Fix criterion factory parameters in CUDA. [#586](#586)
+ Fix the norm-type in the examples. [#612](#612)
+ Fix the WAW race in OpenMP is_sorted_by_column_index. [#617](#617)
+ Fix the example's exec_map by creating the executor only if requested. [#602](#602)
+ Fix some CMake warnings. [#614](#614)
+ Fix Windows building documentation. [#601](#601)
+ Warn when CXX and CUDA host compiler do not match. [#607](#607)
+ Fix reduce_add, prefix_sum, and doc-build. [#593](#593)
+ Fix find_library(cublas) issue on machines installing multiple cuda. [#591](#591)
+ Fix allocator in sellp read. [#589](#589)
+ Fix the CAS with HIP and NVIDIA backends. [#585](#585)

Deletions:
+ Remove unused preconditioner parameter in LowerTrs. [#587](#587)

Related PR: #627
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. is:new-feature A request or implementation of a feature that does not exist yet. mod:all This touches all Ginkgo modules. type:matrix-format This is related to the Matrix formats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants