Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an implicit residual norm criterion. #702

Merged
merged 15 commits into from
Feb 18, 2021
Merged

Conversation

pratikvn
Copy link
Member

@pratikvn pratikvn commented Feb 6, 2021

This PR adds an implicit residual norm convergence criterion, which checks the reduction in the implicit residual calculated in a solver. Some libraries (example: MFEM) use an implicit residual as their main criterion. This criterion class will allow apples to apples comparison of our solvers to theirs.

The idea is that for solvers such as CG or extensions of CG (BiCG, BiCGSTAB, CGS, FCG...) we can use scalars which are representative of the residual norm of the solution. For example, in CG, the scalar rho is always computed in each iteration and closely follows the error in the solution.

This allows us to save computing the residual norm from the residual vector, which is what we do in most cases to check for convergence in the ResidualNormReduction stopping criterion.

Of course, for some solvers, there is no implicit residual norm available, and for those solvers, this stopping criterion will throw a NOT_SUPPORTED exception.

Edited

Updated modifications:

  1. The ResidualNorm class now can perform all three types of criterion checks: absolute, relative and reduction.
  2. The previous classes are marked with the [[deprecated("message")]] attribute. Some byproducts of this are:
    i. Whenever this class is used a warning is thrown at compile time.
    ii. As we need to still test the older class functionality, we will always throw the warning during compilation.

    A deprecated note is added to the deprecated classes.
  3. As we use ResidualNormReduction kernels everywhere, these are now replaced by ResidualNorm. The default behaviour of ResidualNorm should be the same as ResidualNormReduction, so the change in function name should not affect where ResidualNormReduction was previously called.
  4. ResidualNorm and ImplicitResidualNorm use the same class structure and functions except that they call different check_impl kernels. In the future we might want to create a ResidualNormBase class and derive both from that. But I think currently that breaks interface, so I more or less duplicated the code.

Unrelated modifications:

  1. fill with value functions were added for Array and Dense. I think these are really helpful to just fill the entire values arrays with a single value. This just uses the fill_array kernel that we previously had, so no new kernels are added.
  2. In IDR, we were computing the residual norm in each iteration, but were not updating the stopping criterion with this information, which is now fixed.

+ Saves on computing a 2 norm in each iteration in the
  stopping criterion.
+ Also makes the results comparable with other libraries
  that use this type of implicit residual for convergence
  checks.
+ Add criterion tests for solvers.
@pratikvn pratikvn added is:new-feature A request or implementation of a feature that does not exist yet. 1:ST:ready-for-review This PR is ready for review type:stopping-criteria This is related to the stopping criteria mod:all This touches all Ginkgo modules. labels Feb 6, 2021
@pratikvn pratikvn self-assigned this Feb 6, 2021
@ginkgo-bot ginkgo-bot added reg:testing This is related to testing. type:solver This is related to the solvers labels Feb 6, 2021
@codecov
Copy link

codecov bot commented Feb 6, 2021

Codecov Report

Merging #702 (2f74910) into develop (086cbab) will increase coverage by 0.09%.
The diff coverage is 97.21%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #702      +/-   ##
===========================================
+ Coverage    92.74%   92.83%   +0.09%     
===========================================
  Files          354      355       +1     
  Lines        25612    26046     +434     
===========================================
+ Hits         23753    24179     +426     
- Misses        1859     1867       +8     
Impacted Files Coverage Δ
core/device_hooks/common_kernels.inc.cpp 0.00% <0.00%> (ø)
include/ginkgo/core/base/array.hpp 89.56% <ø> (ø)
include/ginkgo/core/base/types.hpp 92.59% <ø> (ø)
include/ginkgo/core/log/logger.hpp 87.50% <0.00%> (-4.61%) ⬇️
include/ginkgo/core/matrix/dense.hpp 98.13% <ø> (ø)
include/ginkgo/core/preconditioner/ic.hpp 76.34% <0.00%> (ø)
omp/components/fill_array.cpp 100.00% <ø> (ø)
reference/components/fill_array.cpp 100.00% <ø> (ø)
include/ginkgo/core/stop/residual_norm.hpp 90.12% <90.62%> (+1.98%) ⬆️
reference/test/stop/residual_norm_kernels.cpp 97.63% <97.48%> (+0.34%) ⬆️
... and 46 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 086cbab...2f74910. Read the comment docs.

@hartwiganzt
Copy link
Collaborator

I think having this is very useful! However, I am unsure whether "implicit" is the best name, or whether "recurrence residual" is better.

@pratikvn
Copy link
Member Author

pratikvn commented Feb 8, 2021

We always pass in the recurrent residual (the explicitly computed residual vector) to the stopping criterion. This is what we check in the relative residual norm reduction stopping criterion. Even though this is also a recurrent residual, I think "implicit residual" might be better suited because the residual norm is implicitly computed, so for example in case of preconditioned CG, the implicit residual norm would be r'*P*r, while the recurrent residual vector would be r and its norm r'*r

@upsj
Copy link
Member

upsj commented Feb 8, 2021

Here I would note the distinction between residual, residual norm and implicit residual norm:
(recurrent) residual is exact modulo rounding errors (every solver)
(recurrent) residual norm is exact modulo rounding errors (GMRES)
implicit residual norm is only correlated with the exact residual norm (CG, ...)

Copy link
Collaborator

@hartwiganzt hartwiganzt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Slaedr Slaedr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I do have a few comments. Thanks!

core/solver/idr.cpp Show resolved Hide resolved
hip/stop/residual_norm_kernels.hip.cpp Show resolved Hide resolved
reference/test/solver/bicg_kernels.cpp Outdated Show resolved Hide resolved
reference/test/stop/residual_norm_kernels.cpp Outdated Show resolved Hide resolved
@Slaedr
Copy link
Contributor

Slaedr commented Feb 8, 2021

Here I would note the distinction between residual, residual norm and implicit residual norm:
(recurrent) residual is exact modulo rounding errors (every solver)
(recurrent) residual norm is exact modulo rounding errors (GMRES)
implicit residual norm is only correlated with the exact residual norm (CG, ...)

I would note that, only in the case of left preconditioned GMRES, the recurrent residual norm (from the Hessenberg least-squares RHS) is the exact preconditioned residual norm. So when convergence is detected based on it, the true residual would typically still not be converged.

Copy link
Collaborator

@nbeams nbeams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pratikvn, it seems this will be useful for the MFEM integration, but also more generally for all Ginkgo users in the future.

include/ginkgo/core/stop/residual_norm.hpp Outdated Show resolved Hide resolved
+ Add a parameter relative to which the implicit residual is computed.
+ Update docs.

Co-authored-by: Natalie Beams <[email protected]>
Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

My biggest concern is the usage of a string instead of an enum or boolean to decide between the initial guess or the right-hand side as the reference. Otherwise, I only have minor nits.

cuda/test/stop/residual_norm_kernels.cpp Outdated Show resolved Hide resolved
cuda/test/stop/residual_norm_kernels.cpp Outdated Show resolved Hide resolved
include/ginkgo/core/stop/residual_norm.hpp Outdated Show resolved Hide resolved
omp/test/stop/residual_norm_kernels.cpp Outdated Show resolved Hide resolved
hip/stop/residual_norm_kernels.hip.cpp Show resolved Hide resolved
hip/test/stop/residual_norm_kernels.cpp Outdated Show resolved Hide resolved
include/ginkgo/core/stop/residual_norm.hpp Outdated Show resolved Hide resolved
reference/test/stop/residual_norm_kernels.cpp Outdated Show resolved Hide resolved
reference/test/stop/residual_norm_kernels.cpp Show resolved Hide resolved
reference/test/stop/residual_norm_kernels.cpp Outdated Show resolved Hide resolved
upsj
upsj previously requested changes Feb 9, 2021
Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, great idea! I would prefer if we could remove the compute_absolute calls, since they allocate memory in each allocation. Another question would be: Do we want to report these implicit residuals to the loggers? Finally, in your stopping criterion, you currently represent relative residuals and residual reduction. What do you think about making this consistent with the regular residual stopping criteria and adding 3 separate stopping criteria based on the baseline value (absolute, relative, residual reduction)?

core/solver/bicgstab.cpp Outdated Show resolved Hide resolved
core/solver/cg.cpp Outdated Show resolved Hide resolved
core/solver/cgs.cpp Outdated Show resolved Hide resolved
core/solver/fcg.cpp Outdated Show resolved Hide resolved
core/stop/residual_norm_kernels.hpp Show resolved Hide resolved
cuda/stop/residual_norm_kernels.cu Outdated Show resolved Hide resolved
hip/stop/residual_norm_kernels.hip.cpp Outdated Show resolved Hide resolved
hip/stop/residual_norm_kernels.hip.cpp Show resolved Hide resolved
omp/stop/residual_norm_kernels.cpp Outdated Show resolved Hide resolved
reference/stop/residual_norm_kernels.cpp Outdated Show resolved Hide resolved
@pratikvn
Copy link
Member Author

pratikvn commented Feb 9, 2021

Regarding reporting the implicit residuals to the loggers is a good point and I did have a look at it, but I think that breaks our interface as we cannot add another parameter to the Updater class.

Regarding adding three separate criterion classes, I think that would be a lot of duplication and unnecessary code. But if everyone feels that they would prefer to have separate classes, I can do that.

@pratikvn
Copy link
Member Author

@thoasm thank you for the updates and the fixes. I think your solution is good. I think the fact that we dont have to duplicate check_impl for the deprecated criteria is good.

I also added the strided kernels, which I had missed. I think this should be ready for reviewing again.

@pratikvn
Copy link
Member Author

@tcojean , regarding the logger. Maybe I misunderstand your comment. As I see it, I need to update the criterion_check_started to also take the implicit_residual_norm from the updater, which is passed to the logger in include/.../stop/criterion.hpp line 157. I think I cannot add another GKO_REGISTER_EVENT with criterion_check_started as that sets some variables with the same name so overloading is not possible. Adding something like criterion_check_started_new is possible, I guess without breaking interface, but that again will be in the public interface and will cause naming issues in the future. I would prefer to modify the criterion_check_started and break the interface for good in 2.0.

Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tcojean
Copy link
Member

tcojean commented Feb 17, 2021

@pratikvn What I mean is that maybe it works if you add the implicit residual at the end and use an = nullptr default case, both for the logger event and the criterion updater? Both the current and new interface would work. Of course, that's maybe not optimal in terms of positioning etc.

@pratikvn pratikvn requested a review from upsj February 18, 2021 07:10
@pratikvn pratikvn dismissed upsj’s stale review February 18, 2021 07:10

Changes addressed.

Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mostly, I have two ideas how we could improve and future-proof this approach:

  1. logging: You can add another parameter to a logger without breaking interface as follows:
    GKO_LOGGER_REGISTER_EVENT(21, iteration_complete, ..., const LinOp *implicit_tau_sq = nullptr)
protected:                                                           \
    virtual void on_iteration_complete(const LinOp *solver, const size_type &it,
            const LinOp *r, const LinOp *x, const LinOp *tau) const {
        this->on_iteration_complete(solver, it, r, x, tau, nullptr);
    }
  1. stopping criterion generation: We could avoid the with_baseline factory parameter call if we had more speaking with_tolerance functions:
Factory& with_reduction_factor(double) { baseline = initial_resnorm; ... }
Factory& with_relative_residual(double) { baseline = rhs_norm; ... }
Factory& with_absolute_residual(double) { baseline = absolute; ... }

and adding another value none to the baseline enum. When we try to generate a residual norm on that enum value, we throw an exception: "Baseline not specified" or something like that.

include/ginkgo/core/base/types.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/types.hpp Outdated Show resolved Hide resolved
core/matrix/dense.cpp Outdated Show resolved Hide resolved
common/matrix/dense_kernels.hpp.inc Outdated Show resolved Hide resolved
@@ -169,6 +171,7 @@ void Bicgstab<ValueType>::apply_impl(const LinOp *b, LinOp *x) const
stop_criterion->update()
.num_iterations(iter)
.residual(s.get())
.implicit_sq_residual_norm(rho.get())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't update rho between these two calls, does it make sense to provide the same value twice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to always report this so that check_impl can capture it. If it is nullptr, then the check_impl for ImplicitResidualNorm will fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see any downside to checking for nullptr in check_impl and, in that case, doing nothing? If the stopping status is not changed, the iteration should continue normally, right? Only then we can't stop at half-iterations with BiCGSTAB

core/base/array.cpp Outdated Show resolved Hide resolved
core/stop/residual_norm.cpp Outdated Show resolved Hide resolved
cuda/test/utils.hpp Outdated Show resolved Hide resolved
reference/test/matrix/dense_kernels.cpp Show resolved Hide resolved
@pratikvn
Copy link
Member Author

@upsj , Regarding the logger, I guess @tcojean was also talking about iteration_complete. I completely missed that. I was thinking of criterion_check_x. I will update that.

Regarding the factory parameters, I prefer the with_baseline than separate parameters for the different baselines.

@pratikvn
Copy link
Member Author

Actually, having another look at the iteration_complete and that it is a public function which is overridden in other loggers, adding parameters, actually breaks interface, I think.

@upsj
Copy link
Member

upsj commented Feb 18, 2021

@pratikvn That's what the second on_iteration_complete overload is for. Though I actually got it the wrong way round, this current implementation might cause iteration events to get lost on old loggers. The correct solution would be

    GKO_LOGGER_REGISTER_EVENT(21, iteration_complete, ...)
protected:                                                           \
    virtual void on_iteration_complete(const LinOp *solver, const size_type &it,
            const LinOp *r, const LinOp *x, const LinOp *tau, const LinOp *implicit_tau_sq = nullptr) const {
        this->on_iteration_complete(solver, it, r, x, tau);
    }

and overriding the new on_iteration_complete where necessary

@pratikvn
Copy link
Member Author

@upsj , I think the problem is that the call of this->on_iteration_complete(solver, it, r, x, tau) inside the new on_iteration_complete is ambiguous because the compiler cannot resolve between the two when you call it with just non-specific parameters. Also I think any call to on_iteration_complete will be ambiguous unless all the parameters are specified.

@upsj
Copy link
Member

upsj commented Feb 18, 2021

@pratikvn Oh yeah, when you drop the default value for the last parameter, the ambiguity goes away. You only need to make sure when overriding the new interface that you override both functions, one with your actual implementation, and the other one with

void on_iteration_complete(const LinOp *solver, const size_type &it,
            const LinOp *r, const LinOp *x = nullptr, const LinOp *tau = nullptr) const {
        this->on_iteration_complete(solver, it, r, x, tau, nullptr);
    }

@pratikvn
Copy link
Member Author

Maybe I am still misunderstanding what you mean here.

Consider a logging call from the gmres apply. This essentially the log function, which calls the on_x based on the Event template parameter, therefore calling on_iteration_complete with the parameters passed into the log function.

The question is as for example in the call to log in gmres, where you pass in 4 parameters in addition to the solver, the on in the logger does not know which on_iteration_complete to call, because both the on_iteration_complete have the same parameters and have only one extra in the second overload.

Unless we specify all the parameters for the iteration_complete in the log call, this will be ambiguous. And adding parameters to all log calls will break interface, because in MFEM and deal.ii, we do derive and use from the previous loggers.

@upsj
Copy link
Member

upsj commented Feb 18, 2021

The new overload with the additional parameter must not have any default parameters, then there is no ambiguity between them. If you specify all parameter, the new overload is used, otherwise the old one is called.

@pratikvn
Copy link
Member Author

Okay, I think I finally get you mean. :)

Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I like the unification of the interface, probably something we can finish with 2.0
Only the two overloads of iteration_complete need to be swapped, otherwise old loggers can miss new events.

+ Simplify fill kernel.
+ Add implicit res to logger.
+ Some doc fixes.

Co-authored-by: Tobias Ribizel <[email protected]>
@pratikvn pratikvn added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Feb 18, 2021
@sonarcloud
Copy link

sonarcloud bot commented Feb 18, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 27 Code Smells

85.9% 85.9% Coverage
3.7% 3.7% Duplication

@pratikvn pratikvn merged commit 555b8b2 into develop Feb 18, 2021
@pratikvn pratikvn deleted the recur-res-stop-crit branch February 18, 2021 20:57
tcojean added a commit that referenced this pull request Aug 20, 2021
Ginkgo release 1.4.0

The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This
release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem
which enables Intel-GPU and CPU execution. The only Ginkgo features which have
not been ported yet are some preconditioners.

Ginkgo's mixed-precision support is greatly enhanced thanks to:
1. The new Accessor concept, which allows writing kernels featuring on-the-fly
memory compression, among other features. The accessor can be used as
header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example.
2. All LinOps now transparently support mixed-precision execution. By default,
this is done through a temporary copy which may have a performance impact but
already allows mixed-precision research.

Native mixed-precision ELL kernels are implemented which do not see this cost.
The accessor is also leveraged in a new CB-GMRES solver which allows for
performance improvements by compressing the Krylov basis vectors. Many other
features have been added to Ginkgo, such as reordering support, a new IDR
solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU
for now), machine topology information, and more!

Supported systems and requirements:
+ For all platforms, cmake 3.13+
+ C++14 compliant compiler
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + clang: 3.9+
  + Intel compiler: 2018+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
  + HIP module: ROCm 3.5+
  + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`.
+ Windows
  + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + Microsoft Visual Studio: VS 2019
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


Algorithm and important feature additions:
+ Add a new DPC++ Executor for SYCL execution and other base utilities
  [#648](#648), [#661](#661), [#757](#757), [#832](#832)
+ Port matrix formats, solvers and related kernels to DPC++. For some kernels,
  also make use of a shared kernel implementation for all executors (except
  Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856)
+ Add accessors which allow multi-precision kernels, among other things.
  [#643](#643), [#708](#708)
+ Add support for mixed precision operations through apply in all LinOps. [#677](#677)
+ Add incomplete Cholesky factorizations and preconditioners as well as some
  improvements to ILU. [#672](#672), [#837](#837), [#846](#846)
+ Add an AMGX implementation and kernels on all devices but DPC++.
  [#528](#528), [#695](#695), [#860](#860)
+ Add a new mixed-precision capability solver, Compressed Basis GMRES
  (CB-GMRES). [#693](#693), [#763](#763)
+ Add the IDR(s) solver. [#620](#620)
+ Add a new fixed-size block CSR matrix format (for the Reference executor).
  [#671](#671), [#730](#730)
+ Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780)
+ Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649)
+ Add matrix assembly support on CPUs. [#644](#644)
+ Extends ISAI from triangular to general and spd matrices. [#690](#690)

Other additions:
+ Add the possibility to apply real matrices to complex vectors.
  [#655](#655), [#658](#658)
+ Add functions to compute the absolute of a matrix format. [#636](#636)
+ Add symmetric permutation and improve existing permutations.
  [#684](#684), [#657](#657), [#663](#663)
+ Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697)
+ Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850)
+ Row-major accessor is generalized to more than 2 dimensions and a new
  "block column-major" accessor has been added. [#707](#707)
+ Add an heat equation example. [#698](#698), [#706](#706)
+ Add ccache support in CMake and CI. [#725](#725), [#739](#739)
+ Allow tuning and benchmarking variables non intrusively. [#692](#692)
+ Add triangular solver benchmark [#664](#664)
+ Add benchmarks for BLAS operations [#772](#772), [#829](#829)
+ Add support for different precisions and consistent index types in benchmarks.
  [#675](#675), [#828](#828)
+ Add a Github bot system to facilitate development and PR management.
  [#667](#667), [#674](#674), [#689](#689), [#853](#853)
+ Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781)
+ Add ssh debugging for Github Actions CI. [#749](#749)
+ Add pipeline segmentation for better CI speed. [#737](#737)


Changes:
+ Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854)
+ Add implicit residual log for solvers and benchmarks. [#714](#714)
+ Change handling of the conjugate in the dense dot product. [#755](#755)
+ Improved Dense stride handling. [#774](#774)
+ Multiple improvements to the OpenMP kernels performance, including COO,
an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740)
+ Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718)
+ Improved Identity constructor and treatment of rectangular matrices. [#646](#646)
+ Allow CUDA/HIP executors to select allocation mode. [#758](#758)
+ Check if executors share the same memory. [#670](#670)
+ Improve test install and smoke testing support. [#721](#721)
+ Update the JOSS paper citation and add publications in the documentation.
  [#629](#629), [#724](#724)
+ Improve the version output. [#806](#806)
+ Add some utilities for dim and span. [#821](#821)
+ Improved solver and preconditioner benchmarks. [#660](#660)
+ Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812)


Fixes:
+ Sorting fix for the Jacobi preconditioner. [#659](#659)
+ Also log the first residual norm in CGS [#735](#735)
+ Fix BiCG and HIP CSR to work with complex matrices. [#651](#651)
+ Fix Coo SpMV on strided vectors. [#807](#807)
+ Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769)
+ Fix device_reset issue by moving counter/mutex to device. [#810](#810)
+ Fix `EnableLogging` superclass. [#841](#841)
+ Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726)
+ Decreased test size for a few device tests. [#742](#742)
+ Fix multiple issues with our CMake HIP and RPATH setup.
  [#712](#712), [#745](#745), [#709](#709)
+ Cleanup our CMake installation step. [#713](#713)
+ Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785)
+ Simplify third-party integration. [#786](#786)
+ Improve Ginkgo device arch flags management. [#696](#696)
+ Other fixes and improvements to the CMake setup.
  [#685](#685), [#792](#792), [#705](#705), [#836](#836)
+ Clarification of dense norm documentation [#784](#784)
+ Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840)
+ Make multiple operators/constructors explicit. [#650](#650), [#761](#761)
+ Fix some issues, memory leaks and warnings found by MSVC.
  [#666](#666), [#731](#731)
+ Improved solver memory estimates and consistent iteration counts [#691](#691)
+ Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754)
+ Fix for ForwardIterator requirements in iterator_factory. [#665](#665)
+ Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722)
+ Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852)


Related PR: #857
tcojean added a commit that referenced this pull request Aug 23, 2021
Release 1.4.0 to master

The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This
release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem
which enables Intel-GPU and CPU execution. The only Ginkgo features which have
not been ported yet are some preconditioners.

Ginkgo's mixed-precision support is greatly enhanced thanks to:
1. The new Accessor concept, which allows writing kernels featuring on-the-fly
memory compression, among other features. The accessor can be used as
header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example.
2. All LinOps now transparently support mixed-precision execution. By default,
this is done through a temporary copy which may have a performance impact but
already allows mixed-precision research.

Native mixed-precision ELL kernels are implemented which do not see this cost.
The accessor is also leveraged in a new CB-GMRES solver which allows for
performance improvements by compressing the Krylov basis vectors. Many other
features have been added to Ginkgo, such as reordering support, a new IDR
solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU
for now), machine topology information, and more!

Supported systems and requirements:
+ For all platforms, cmake 3.13+
+ C++14 compliant compiler
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + clang: 3.9+
  + Intel compiler: 2018+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
  + HIP module: ROCm 3.5+
  + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`.
+ Windows
  + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + Microsoft Visual Studio: VS 2019
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


Algorithm and important feature additions:
+ Add a new DPC++ Executor for SYCL execution and other base utilities
  [#648](#648), [#661](#661), [#757](#757), [#832](#832)
+ Port matrix formats, solvers and related kernels to DPC++. For some kernels,
  also make use of a shared kernel implementation for all executors (except
  Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856)
+ Add accessors which allow multi-precision kernels, among other things.
  [#643](#643), [#708](#708)
+ Add support for mixed precision operations through apply in all LinOps. [#677](#677)
+ Add incomplete Cholesky factorizations and preconditioners as well as some
  improvements to ILU. [#672](#672), [#837](#837), [#846](#846)
+ Add an AMGX implementation and kernels on all devices but DPC++.
  [#528](#528), [#695](#695), [#860](#860)
+ Add a new mixed-precision capability solver, Compressed Basis GMRES
  (CB-GMRES). [#693](#693), [#763](#763)
+ Add the IDR(s) solver. [#620](#620)
+ Add a new fixed-size block CSR matrix format (for the Reference executor).
  [#671](#671), [#730](#730)
+ Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780)
+ Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649)
+ Add matrix assembly support on CPUs. [#644](#644)
+ Extends ISAI from triangular to general and spd matrices. [#690](#690)

Other additions:
+ Add the possibility to apply real matrices to complex vectors.
  [#655](#655), [#658](#658)
+ Add functions to compute the absolute of a matrix format. [#636](#636)
+ Add symmetric permutation and improve existing permutations.
  [#684](#684), [#657](#657), [#663](#663)
+ Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697)
+ Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850)
+ Row-major accessor is generalized to more than 2 dimensions and a new
  "block column-major" accessor has been added. [#707](#707)
+ Add an heat equation example. [#698](#698), [#706](#706)
+ Add ccache support in CMake and CI. [#725](#725), [#739](#739)
+ Allow tuning and benchmarking variables non intrusively. [#692](#692)
+ Add triangular solver benchmark [#664](#664)
+ Add benchmarks for BLAS operations [#772](#772), [#829](#829)
+ Add support for different precisions and consistent index types in benchmarks.
  [#675](#675), [#828](#828)
+ Add a Github bot system to facilitate development and PR management.
  [#667](#667), [#674](#674), [#689](#689), [#853](#853)
+ Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781)
+ Add ssh debugging for Github Actions CI. [#749](#749)
+ Add pipeline segmentation for better CI speed. [#737](#737)


Changes:
+ Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854)
+ Add implicit residual log for solvers and benchmarks. [#714](#714)
+ Change handling of the conjugate in the dense dot product. [#755](#755)
+ Improved Dense stride handling. [#774](#774)
+ Multiple improvements to the OpenMP kernels performance, including COO,
an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740)
+ Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718)
+ Improved Identity constructor and treatment of rectangular matrices. [#646](#646)
+ Allow CUDA/HIP executors to select allocation mode. [#758](#758)
+ Check if executors share the same memory. [#670](#670)
+ Improve test install and smoke testing support. [#721](#721)
+ Update the JOSS paper citation and add publications in the documentation.
  [#629](#629), [#724](#724)
+ Improve the version output. [#806](#806)
+ Add some utilities for dim and span. [#821](#821)
+ Improved solver and preconditioner benchmarks. [#660](#660)
+ Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812)


Fixes:
+ Sorting fix for the Jacobi preconditioner. [#659](#659)
+ Also log the first residual norm in CGS [#735](#735)
+ Fix BiCG and HIP CSR to work with complex matrices. [#651](#651)
+ Fix Coo SpMV on strided vectors. [#807](#807)
+ Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769)
+ Fix device_reset issue by moving counter/mutex to device. [#810](#810)
+ Fix `EnableLogging` superclass. [#841](#841)
+ Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726)
+ Decreased test size for a few device tests. [#742](#742)
+ Fix multiple issues with our CMake HIP and RPATH setup.
  [#712](#712), [#745](#745), [#709](#709)
+ Cleanup our CMake installation step. [#713](#713)
+ Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785)
+ Simplify third-party integration. [#786](#786)
+ Improve Ginkgo device arch flags management. [#696](#696)
+ Other fixes and improvements to the CMake setup.
  [#685](#685), [#792](#792), [#705](#705), [#836](#836)
+ Clarification of dense norm documentation [#784](#784)
+ Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840)
+ Make multiple operators/constructors explicit. [#650](#650), [#761](#761)
+ Fix some issues, memory leaks and warnings found by MSVC.
  [#666](#666), [#731](#731)
+ Improved solver memory estimates and consistent iteration counts [#691](#691)
+ Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754)
+ Fix for ForwardIterator requirements in iterator_factory. [#665](#665)
+ Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722)
+ Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852)

Related PR: #866
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. is:new-feature A request or implementation of a feature that does not exist yet. mod:all This touches all Ginkgo modules. reg:testing This is related to testing. type:solver This is related to the solvers type:stopping-criteria This is related to the stopping criteria
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants