Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add absolute of matrix format #636

Merged
merged 15 commits into from
Sep 25, 2020
Merged

Add absolute of matrix format #636

merged 15 commits into from
Sep 25, 2020

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Sep 4, 2020

This PR adds the absolute (inplace/outplace) of the matrix which is related to #634 and #528.

  • add make_complex to_complex/real
  • make to_complex/real and remove_complex work for class or scalar/complex
    Note. Can not use remove_complex<class> as friend class. Need to use class<remove_complex<V>, I>. this type helper should work in the other situation.
  • add inplace_absolute_array and outplace_absolute_array
  • format->turn_absolute(), format->apply_absolute(), format->compute_absolute_inplace() is inplace absolute function: format -> format
  • auto result = format->get_absolute() auto result = format->compute_absolute() is outplace absolute function: format -> remove_complex (result type)
  • Add AbsoluteComputable interface and compute_absolute_linop can be used with LinOp.
    After this PR, need /bigobj for all MSVC.

@yhmtsai yhmtsai added is:new-feature A request or implementation of a feature that does not exist yet. type:matrix-format This is related to the Matrix formats mod:all This touches all Ginkgo modules. labels Sep 4, 2020
@yhmtsai yhmtsai self-assigned this Sep 4, 2020
@codecov
Copy link

codecov bot commented Sep 5, 2020

Codecov Report

Merging #636 into develop will increase coverage by 0.07%.
The diff coverage is 96.14%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #636      +/-   ##
===========================================
+ Coverage    92.88%   92.96%   +0.07%     
===========================================
  Files          303      307       +4     
  Lines        21331    21763     +432     
===========================================
+ Hits         19814    20232     +418     
- Misses        1517     1531      +14     
Impacted Files Coverage Δ
core/base/extended_float.hpp 91.26% <ø> (ø)
core/device_hooks/common_kernels.inc.cpp 0.00% <0.00%> (ø)
include/ginkgo/core/matrix/coo.hpp 94.73% <ø> (ø)
include/ginkgo/core/matrix/csr.hpp 47.72% <ø> (ø)
include/ginkgo/core/matrix/dense.hpp 97.70% <ø> (ø)
include/ginkgo/core/matrix/diagonal.hpp 100.00% <ø> (ø)
include/ginkgo/core/matrix/ell.hpp 100.00% <ø> (ø)
include/ginkgo/core/matrix/sellp.hpp 89.74% <ø> (ø)
include/ginkgo/core/base/lin_op.hpp 95.50% <33.33%> (-2.17%) ⬇️
core/test/base/lin_op.cpp 88.88% <79.31%> (-3.52%) ⬇️
... and 36 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd98a83...e0b703a. Read the comment docs.

Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe emplace is more suitable than inplace ? See definition and usage in std::vector::emplace_back()

And with this instead of turn_absolute, maybe emplace_absolute is better ?

@upsj
Copy link
Member

upsj commented Sep 7, 2020

I like inplace, since it is a pretty common term for what we are doing (compute the absolute and replace the values by it).
I would suggest some alternatives for the names, though, since outplace sounds a bit off to me:
LinOp compute_absolute() and void compute_absolute_inplace()

@tcojean
Copy link
Member

tcojean commented Sep 7, 2020

inplace is definitely the proper term here and it's actually used often in linear algebra in particular https://en.wikipedia.org/wiki/In-place_matrix_transposition (also similar, look for inplace decomposition).
As Tobias says, outplace does seem a bit off although it's probably the proper antonym. I would rather go with Tobias suggestion of outplace being the non qualified default and inplace being specifically qualified in the function name.

@thoasm thoasm mentioned this pull request Sep 8, 2020
9 tasks
@yhmtsai yhmtsai added the 1:ST:ready-for-review This PR is ready for review label Sep 10, 2020
fritzgoebel
fritzgoebel previously approved these changes Sep 11, 2020
Copy link
Collaborator

@fritzgoebel fritzgoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only a few minor comments

common/matrix/dense_kernels.hpp.inc Outdated Show resolved Hide resolved
core/matrix/csr.cpp Show resolved Hide resolved
core/test/base/lin_op.cpp Show resolved Hide resolved
include/ginkgo/core/base/lin_op.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/matrix/csr.hpp Show resolved Hide resolved
reference/test/components/absolute_array.cpp Outdated Show resolved Hide resolved
reference/test/components/absolute_array.cpp Outdated Show resolved Hide resolved
reference/test/matrix/csr_kernels.cpp Show resolved Hide resolved
upsj
upsj previously approved these changes Sep 11, 2020
Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Only a few nits and two suggestions for naming: make_complex -> add_complex and outplace_absolute_type -> just absolute_type. Outplace is not a commonly used term, so while it is perfectly fine in the implementation, in my opinion the public interface should aim to use names as intuitive as possible. outplace_absolute_type would mainly make sense if there was also inplace_absolute_type, but since this is just the same type as this, we don't need it.

core/matrix/ell.cpp Show resolved Hide resolved
mtx->compute_absolute_inplace();
dmtx->compute_absolute_inplace();

GKO_ASSERT_MTX_NEAR(mtx.get(), dmtx.get(), 1e-14);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Comment for all tests in general) I believe you don't need the .get() here, it gets removed by gko::test::assertions::detail::plain_ptr; internally, as far as I know

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only refine the modification part in this PR.

include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/matrix/csr.hpp Outdated Show resolved Hide resolved
Copy link
Member

@tcojean tcojean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some small comments.

core/matrix/csr.cpp Show resolved Hide resolved
core/matrix/hybrid.cpp Outdated Show resolved Hide resolved
core/test/base/lin_op.cpp Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/matrix/csr.hpp Outdated Show resolved Hide resolved
1. remove/add_complex
2. to_real/complex alias
3. using absolute_type not outplace_absolute_type
4. remove unneeded .get() in test

Co-authored-by: Tobias Ribizel <[email protected]>
tcojean
tcojean previously approved these changes Sep 17, 2020
include/ginkgo/core/base/math.hpp Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Show resolved Hide resolved
@yhmtsai
Copy link
Member Author

yhmtsai commented Sep 17, 2020

Update:

  1. adding to_complex/real
  2. add get_strategy in Hybrid which allows get the same strategy with the new HybType
    minimal_storage_limit will narrow to imbalance_limt with same config when the sizeof(VT)/sizeof(IT) are different such that
    the new Hyb distribution is same as strategy.
  3. change EnableAbsoluteComputation<ConcreteLinOp> to EnableAbsoluteComputation<AbsoluteLinOp> to give easily implementation for non usual cases in future

@tcojean
Copy link
Member

tcojean commented Sep 23, 2020

Anyone to re-review this PR so that we can merge it? It was already reviewed previously several times anyway.

Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 1 / hopefully 2 of my review.
Mostly contains nits and naming suggestions.

cuda/test/components/absolute_array.cpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/matrix/coo.hpp Show resolved Hide resolved
include/ginkgo/core/matrix/csr.hpp Show resolved Hide resolved
Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I mostly have stylistic comments.

core/test/base/math.cpp Outdated Show resolved Hide resolved
core/test/base/lin_op.cpp Outdated Show resolved Hide resolved
core/test/base/lin_op.cpp Outdated Show resolved Hide resolved
hip/test/components/absolute_array.hip.cpp Outdated Show resolved Hide resolved
include/ginkgo/core/matrix/hybrid.hpp Outdated Show resolved Hide resolved
reference/test/components/absolute_array.cpp Outdated Show resolved Hide resolved
reference/test/matrix/dense_kernels.cpp Outdated Show resolved Hide resolved
reference/test/matrix/dense_kernels.cpp Show resolved Hide resolved
reference/test/matrix/diagonal_kernels.cpp Outdated Show resolved Hide resolved
reference/test/matrix/diagonal_kernels.cpp Outdated Show resolved Hide resolved
- remove unneed void_t on enable_if_t
- remove the remove_complex_s, to_complex_s/real_s in gko namespace
- refine code style on test according to AAA rule

Co-authored-by: Thomas Grützmacher <[email protected]>
@yhmtsai yhmtsai added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Sep 25, 2020
@yhmtsai yhmtsai merged commit 009ab80 into develop Sep 25, 2020
@yhmtsai yhmtsai deleted the handle_complex_of_class branch September 25, 2020 11:01
@sonarcloud
Copy link

sonarcloud bot commented Sep 25, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

warning The version of Java (1.8.0_121) you have used to run this analysis is deprecated and we will stop accepting it from October 2020. Please update to at least Java 11.
Read more here

tcojean added a commit that referenced this pull request Aug 20, 2021
Ginkgo release 1.4.0

The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This
release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem
which enables Intel-GPU and CPU execution. The only Ginkgo features which have
not been ported yet are some preconditioners.

Ginkgo's mixed-precision support is greatly enhanced thanks to:
1. The new Accessor concept, which allows writing kernels featuring on-the-fly
memory compression, among other features. The accessor can be used as
header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example.
2. All LinOps now transparently support mixed-precision execution. By default,
this is done through a temporary copy which may have a performance impact but
already allows mixed-precision research.

Native mixed-precision ELL kernels are implemented which do not see this cost.
The accessor is also leveraged in a new CB-GMRES solver which allows for
performance improvements by compressing the Krylov basis vectors. Many other
features have been added to Ginkgo, such as reordering support, a new IDR
solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU
for now), machine topology information, and more!

Supported systems and requirements:
+ For all platforms, cmake 3.13+
+ C++14 compliant compiler
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + clang: 3.9+
  + Intel compiler: 2018+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
  + HIP module: ROCm 3.5+
  + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`.
+ Windows
  + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + Microsoft Visual Studio: VS 2019
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


Algorithm and important feature additions:
+ Add a new DPC++ Executor for SYCL execution and other base utilities
  [#648](#648), [#661](#661), [#757](#757), [#832](#832)
+ Port matrix formats, solvers and related kernels to DPC++. For some kernels,
  also make use of a shared kernel implementation for all executors (except
  Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856)
+ Add accessors which allow multi-precision kernels, among other things.
  [#643](#643), [#708](#708)
+ Add support for mixed precision operations through apply in all LinOps. [#677](#677)
+ Add incomplete Cholesky factorizations and preconditioners as well as some
  improvements to ILU. [#672](#672), [#837](#837), [#846](#846)
+ Add an AMGX implementation and kernels on all devices but DPC++.
  [#528](#528), [#695](#695), [#860](#860)
+ Add a new mixed-precision capability solver, Compressed Basis GMRES
  (CB-GMRES). [#693](#693), [#763](#763)
+ Add the IDR(s) solver. [#620](#620)
+ Add a new fixed-size block CSR matrix format (for the Reference executor).
  [#671](#671), [#730](#730)
+ Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780)
+ Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649)
+ Add matrix assembly support on CPUs. [#644](#644)
+ Extends ISAI from triangular to general and spd matrices. [#690](#690)

Other additions:
+ Add the possibility to apply real matrices to complex vectors.
  [#655](#655), [#658](#658)
+ Add functions to compute the absolute of a matrix format. [#636](#636)
+ Add symmetric permutation and improve existing permutations.
  [#684](#684), [#657](#657), [#663](#663)
+ Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697)
+ Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850)
+ Row-major accessor is generalized to more than 2 dimensions and a new
  "block column-major" accessor has been added. [#707](#707)
+ Add an heat equation example. [#698](#698), [#706](#706)
+ Add ccache support in CMake and CI. [#725](#725), [#739](#739)
+ Allow tuning and benchmarking variables non intrusively. [#692](#692)
+ Add triangular solver benchmark [#664](#664)
+ Add benchmarks for BLAS operations [#772](#772), [#829](#829)
+ Add support for different precisions and consistent index types in benchmarks.
  [#675](#675), [#828](#828)
+ Add a Github bot system to facilitate development and PR management.
  [#667](#667), [#674](#674), [#689](#689), [#853](#853)
+ Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781)
+ Add ssh debugging for Github Actions CI. [#749](#749)
+ Add pipeline segmentation for better CI speed. [#737](#737)


Changes:
+ Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854)
+ Add implicit residual log for solvers and benchmarks. [#714](#714)
+ Change handling of the conjugate in the dense dot product. [#755](#755)
+ Improved Dense stride handling. [#774](#774)
+ Multiple improvements to the OpenMP kernels performance, including COO,
an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740)
+ Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718)
+ Improved Identity constructor and treatment of rectangular matrices. [#646](#646)
+ Allow CUDA/HIP executors to select allocation mode. [#758](#758)
+ Check if executors share the same memory. [#670](#670)
+ Improve test install and smoke testing support. [#721](#721)
+ Update the JOSS paper citation and add publications in the documentation.
  [#629](#629), [#724](#724)
+ Improve the version output. [#806](#806)
+ Add some utilities for dim and span. [#821](#821)
+ Improved solver and preconditioner benchmarks. [#660](#660)
+ Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812)


Fixes:
+ Sorting fix for the Jacobi preconditioner. [#659](#659)
+ Also log the first residual norm in CGS [#735](#735)
+ Fix BiCG and HIP CSR to work with complex matrices. [#651](#651)
+ Fix Coo SpMV on strided vectors. [#807](#807)
+ Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769)
+ Fix device_reset issue by moving counter/mutex to device. [#810](#810)
+ Fix `EnableLogging` superclass. [#841](#841)
+ Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726)
+ Decreased test size for a few device tests. [#742](#742)
+ Fix multiple issues with our CMake HIP and RPATH setup.
  [#712](#712), [#745](#745), [#709](#709)
+ Cleanup our CMake installation step. [#713](#713)
+ Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785)
+ Simplify third-party integration. [#786](#786)
+ Improve Ginkgo device arch flags management. [#696](#696)
+ Other fixes and improvements to the CMake setup.
  [#685](#685), [#792](#792), [#705](#705), [#836](#836)
+ Clarification of dense norm documentation [#784](#784)
+ Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840)
+ Make multiple operators/constructors explicit. [#650](#650), [#761](#761)
+ Fix some issues, memory leaks and warnings found by MSVC.
  [#666](#666), [#731](#731)
+ Improved solver memory estimates and consistent iteration counts [#691](#691)
+ Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754)
+ Fix for ForwardIterator requirements in iterator_factory. [#665](#665)
+ Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722)
+ Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852)


Related PR: #857
tcojean added a commit that referenced this pull request Aug 23, 2021
Release 1.4.0 to master

The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This
release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem
which enables Intel-GPU and CPU execution. The only Ginkgo features which have
not been ported yet are some preconditioners.

Ginkgo's mixed-precision support is greatly enhanced thanks to:
1. The new Accessor concept, which allows writing kernels featuring on-the-fly
memory compression, among other features. The accessor can be used as
header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example.
2. All LinOps now transparently support mixed-precision execution. By default,
this is done through a temporary copy which may have a performance impact but
already allows mixed-precision research.

Native mixed-precision ELL kernels are implemented which do not see this cost.
The accessor is also leveraged in a new CB-GMRES solver which allows for
performance improvements by compressing the Krylov basis vectors. Many other
features have been added to Ginkgo, such as reordering support, a new IDR
solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU
for now), machine topology information, and more!

Supported systems and requirements:
+ For all platforms, cmake 3.13+
+ C++14 compliant compiler
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + clang: 3.9+
  + Intel compiler: 2018+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
  + HIP module: ROCm 3.5+
  + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`.
+ Windows
  + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + Microsoft Visual Studio: VS 2019
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


Algorithm and important feature additions:
+ Add a new DPC++ Executor for SYCL execution and other base utilities
  [#648](#648), [#661](#661), [#757](#757), [#832](#832)
+ Port matrix formats, solvers and related kernels to DPC++. For some kernels,
  also make use of a shared kernel implementation for all executors (except
  Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856)
+ Add accessors which allow multi-precision kernels, among other things.
  [#643](#643), [#708](#708)
+ Add support for mixed precision operations through apply in all LinOps. [#677](#677)
+ Add incomplete Cholesky factorizations and preconditioners as well as some
  improvements to ILU. [#672](#672), [#837](#837), [#846](#846)
+ Add an AMGX implementation and kernels on all devices but DPC++.
  [#528](#528), [#695](#695), [#860](#860)
+ Add a new mixed-precision capability solver, Compressed Basis GMRES
  (CB-GMRES). [#693](#693), [#763](#763)
+ Add the IDR(s) solver. [#620](#620)
+ Add a new fixed-size block CSR matrix format (for the Reference executor).
  [#671](#671), [#730](#730)
+ Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780)
+ Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649)
+ Add matrix assembly support on CPUs. [#644](#644)
+ Extends ISAI from triangular to general and spd matrices. [#690](#690)

Other additions:
+ Add the possibility to apply real matrices to complex vectors.
  [#655](#655), [#658](#658)
+ Add functions to compute the absolute of a matrix format. [#636](#636)
+ Add symmetric permutation and improve existing permutations.
  [#684](#684), [#657](#657), [#663](#663)
+ Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697)
+ Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850)
+ Row-major accessor is generalized to more than 2 dimensions and a new
  "block column-major" accessor has been added. [#707](#707)
+ Add an heat equation example. [#698](#698), [#706](#706)
+ Add ccache support in CMake and CI. [#725](#725), [#739](#739)
+ Allow tuning and benchmarking variables non intrusively. [#692](#692)
+ Add triangular solver benchmark [#664](#664)
+ Add benchmarks for BLAS operations [#772](#772), [#829](#829)
+ Add support for different precisions and consistent index types in benchmarks.
  [#675](#675), [#828](#828)
+ Add a Github bot system to facilitate development and PR management.
  [#667](#667), [#674](#674), [#689](#689), [#853](#853)
+ Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781)
+ Add ssh debugging for Github Actions CI. [#749](#749)
+ Add pipeline segmentation for better CI speed. [#737](#737)


Changes:
+ Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854)
+ Add implicit residual log for solvers and benchmarks. [#714](#714)
+ Change handling of the conjugate in the dense dot product. [#755](#755)
+ Improved Dense stride handling. [#774](#774)
+ Multiple improvements to the OpenMP kernels performance, including COO,
an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740)
+ Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718)
+ Improved Identity constructor and treatment of rectangular matrices. [#646](#646)
+ Allow CUDA/HIP executors to select allocation mode. [#758](#758)
+ Check if executors share the same memory. [#670](#670)
+ Improve test install and smoke testing support. [#721](#721)
+ Update the JOSS paper citation and add publications in the documentation.
  [#629](#629), [#724](#724)
+ Improve the version output. [#806](#806)
+ Add some utilities for dim and span. [#821](#821)
+ Improved solver and preconditioner benchmarks. [#660](#660)
+ Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812)


Fixes:
+ Sorting fix for the Jacobi preconditioner. [#659](#659)
+ Also log the first residual norm in CGS [#735](#735)
+ Fix BiCG and HIP CSR to work with complex matrices. [#651](#651)
+ Fix Coo SpMV on strided vectors. [#807](#807)
+ Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769)
+ Fix device_reset issue by moving counter/mutex to device. [#810](#810)
+ Fix `EnableLogging` superclass. [#841](#841)
+ Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726)
+ Decreased test size for a few device tests. [#742](#742)
+ Fix multiple issues with our CMake HIP and RPATH setup.
  [#712](#712), [#745](#745), [#709](#709)
+ Cleanup our CMake installation step. [#713](#713)
+ Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785)
+ Simplify third-party integration. [#786](#786)
+ Improve Ginkgo device arch flags management. [#696](#696)
+ Other fixes and improvements to the CMake setup.
  [#685](#685), [#792](#792), [#705](#705), [#836](#836)
+ Clarification of dense norm documentation [#784](#784)
+ Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840)
+ Make multiple operators/constructors explicit. [#650](#650), [#761](#761)
+ Fix some issues, memory leaks and warnings found by MSVC.
  [#666](#666), [#731](#731)
+ Improved solver memory estimates and consistent iteration counts [#691](#691)
+ Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754)
+ Fix for ForwardIterator requirements in iterator_factory. [#665](#665)
+ Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722)
+ Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852)

Related PR: #866
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. is:new-feature A request or implementation of a feature that does not exist yet. mod:all This touches all Ginkgo modules. type:matrix-format This is related to the Matrix formats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants