Add an implicit residual norm criterion. #702

pratikvn · 2021-02-06T15:27:10Z

This PR adds an implicit residual norm convergence criterion, which checks the reduction in the implicit residual calculated in a solver. Some libraries (example: MFEM) use an implicit residual as their main criterion. This criterion class will allow apples to apples comparison of our solvers to theirs.

The idea is that for solvers such as CG or extensions of CG (BiCG, BiCGSTAB, CGS, FCG...) we can use scalars which are representative of the residual norm of the solution. For example, in CG, the scalar rho is always computed in each iteration and closely follows the error in the solution.

This allows us to save computing the residual norm from the residual vector, which is what we do in most cases to check for convergence in the ResidualNormReduction stopping criterion.

Of course, for some solvers, there is no implicit residual norm available, and for those solvers, this stopping criterion will throw a NOT_SUPPORTED exception.

Edited

Updated modifications:

The ResidualNorm class now can perform all three types of criterion checks: absolute, relative and reduction.
The previous classes are marked with the [[deprecated("message")]] attribute. Some byproducts of this are:
i. Whenever this class is used a warning is thrown at compile time.
ii. As we need to still test the older class functionality, we will always throw the warning during compilation.
A deprecated note is added to the deprecated classes.
As we use ResidualNormReduction kernels everywhere, these are now replaced by ResidualNorm. The default behaviour of ResidualNorm should be the same as ResidualNormReduction, so the change in function name should not affect where ResidualNormReduction was previously called.
ResidualNorm and ImplicitResidualNorm use the same class structure and functions except that they call different check_impl kernels. In the future we might want to create a ResidualNormBase class and derive both from that. But I think currently that breaks interface, so I more or less duplicated the code.

Unrelated modifications:

fill with value functions were added for Array and Dense. I think these are really helpful to just fill the entire values arrays with a single value. This just uses the fill_array kernel that we previously had, so no new kernels are added.
In IDR, we were computing the residual norm in each iteration, but were not updating the stopping criterion with this information, which is now fixed.

+ Saves on computing a 2 norm in each iteration in the stopping criterion. + Also makes the results comparable with other libraries that use this type of implicit residual for convergence checks.

+ Add criterion tests for solvers.

codecov · 2021-02-06T21:52:18Z

Codecov Report

Merging #702 (2f74910) into develop (086cbab) will increase coverage by 0.09%.
The diff coverage is 97.21%.

@@             Coverage Diff             @@
##           develop     #702      +/-   ##
===========================================
+ Coverage    92.74%   92.83%   +0.09%     
===========================================
  Files          354      355       +1     
  Lines        25612    26046     +434     
===========================================
+ Hits         23753    24179     +426     
- Misses        1859     1867       +8

Impacted Files	Coverage Δ
core/device_hooks/common_kernels.inc.cpp	`0.00% <0.00%> (ø)`
include/ginkgo/core/base/array.hpp	`89.56% <ø> (ø)`
include/ginkgo/core/base/types.hpp	`92.59% <ø> (ø)`
include/ginkgo/core/log/logger.hpp	`87.50% <0.00%> (-4.61%)`	⬇️
include/ginkgo/core/matrix/dense.hpp	`98.13% <ø> (ø)`
include/ginkgo/core/preconditioner/ic.hpp	`76.34% <0.00%> (ø)`
omp/components/fill_array.cpp	`100.00% <ø> (ø)`
reference/components/fill_array.cpp	`100.00% <ø> (ø)`
include/ginkgo/core/stop/residual_norm.hpp	`90.12% <90.62%> (+1.98%)`	⬆️
reference/test/stop/residual_norm_kernels.cpp	`97.63% <97.48%> (+0.34%)`	⬆️
... and 46 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 086cbab...2f74910. Read the comment docs.

hartwiganzt · 2021-02-07T15:09:28Z

I think having this is very useful! However, I am unsure whether "implicit" is the best name, or whether "recurrence residual" is better.

pratikvn · 2021-02-08T08:45:20Z

We always pass in the recurrent residual (the explicitly computed residual vector) to the stopping criterion. This is what we check in the relative residual norm reduction stopping criterion. Even though this is also a recurrent residual, I think "implicit residual" might be better suited because the residual norm is implicitly computed, so for example in case of preconditioned CG, the implicit residual norm would be r'*P*r, while the recurrent residual vector would be r and its norm r'*r

upsj · 2021-02-08T10:13:56Z

Here I would note the distinction between residual, residual norm and implicit residual norm:
(recurrent) residual is exact modulo rounding errors (every solver)
(recurrent) residual norm is exact modulo rounding errors (GMRES)
implicit residual norm is only correlated with the exact residual norm (CG, ...)

hartwiganzt

LGTM

Slaedr

LGTM, though I do have a few comments. Thanks!

core/solver/idr.cpp

hip/stop/residual_norm_kernels.hip.cpp

reference/test/solver/bicg_kernels.cpp

reference/test/stop/residual_norm_kernels.cpp

Slaedr · 2021-02-08T15:07:18Z

Here I would note the distinction between residual, residual norm and implicit residual norm:
(recurrent) residual is exact modulo rounding errors (every solver)
(recurrent) residual norm is exact modulo rounding errors (GMRES)
implicit residual norm is only correlated with the exact residual norm (CG, ...)

I would note that, only in the case of left preconditioned GMRES, the recurrent residual norm (from the Hessenberg least-squares RHS) is the exact preconditioned residual norm. So when convergence is detected based on it, the true residual would typically still not be converged.

nbeams

Thanks @pratikvn, it seems this will be useful for the MFEM integration, but also more generally for all Ginkgo users in the future.

include/ginkgo/core/stop/residual_norm.hpp

+ Add a parameter relative to which the implicit residual is computed. + Update docs. Co-authored-by: Natalie Beams <[email protected]>

thoasm

Nice work!

My biggest concern is the usage of a string instead of an enum or boolean to decide between the initial guess or the right-hand side as the reference. Otherwise, I only have minor nits.

cuda/test/stop/residual_norm_kernels.cpp

include/ginkgo/core/stop/residual_norm.hpp

omp/test/stop/residual_norm_kernels.cpp

hip/stop/residual_norm_kernels.hip.cpp

hip/test/stop/residual_norm_kernels.cpp

include/ginkgo/core/stop/residual_norm.hpp

reference/test/stop/residual_norm_kernels.cpp

upsj

LGTM in general, great idea! I would prefer if we could remove the compute_absolute calls, since they allocate memory in each allocation. Another question would be: Do we want to report these implicit residuals to the loggers? Finally, in your stopping criterion, you currently represent relative residuals and residual reduction. What do you think about making this consistent with the regular residual stopping criteria and adding 3 separate stopping criteria based on the baseline value (absolute, relative, residual reduction)?

core/solver/bicgstab.cpp

core/solver/cg.cpp

core/solver/cgs.cpp

core/solver/fcg.cpp

core/stop/residual_norm_kernels.hpp

cuda/stop/residual_norm_kernels.cu

hip/stop/residual_norm_kernels.hip.cpp

omp/stop/residual_norm_kernels.cpp

reference/stop/residual_norm_kernels.cpp

pratikvn · 2021-02-09T12:47:44Z

Regarding reporting the implicit residuals to the loggers is a good point and I did have a look at it, but I think that breaks our interface as we cannot add another parameter to the Updater class.

Regarding adding three separate criterion classes, I think that would be a lot of duplication and unnecessary code. But if everyone feels that they would prefer to have separate classes, I can do that.

pratikvn · 2021-02-17T17:16:35Z

@thoasm thank you for the updates and the fixes. I think your solution is good. I think the fact that we dont have to duplicate check_impl for the deprecated criteria is good.

I also added the strided kernels, which I had missed. I think this should be ready for reviewing again.

pratikvn · 2021-02-17T17:29:20Z

@tcojean , regarding the logger. Maybe I misunderstand your comment. As I see it, I need to update the criterion_check_started to also take the implicit_residual_norm from the updater, which is passed to the logger in include/.../stop/criterion.hpp line 157. I think I cannot add another GKO_REGISTER_EVENT with criterion_check_started as that sets some variables with the same name so overloading is not possible. Adding something like criterion_check_started_new is possible, I guess without breaking interface, but that again will be in the public interface and will cause naming issues in the future. I would prefer to modify the criterion_check_started and break the interface for good in 2.0.

thoasm

LGTM!

tcojean · 2021-02-17T18:20:00Z

@pratikvn What I mean is that maybe it works if you add the implicit residual at the end and use an = nullptr default case, both for the logger event and the criterion updater? Both the current and new interface would work. Of course, that's maybe not optimal in terms of positioning etc.

Changes addressed.

upsj

LGTM mostly, I have two ideas how we could improve and future-proof this approach:

logging: You can add another parameter to a logger without breaking interface as follows:

    GKO_LOGGER_REGISTER_EVENT(21, iteration_complete, ..., const LinOp *implicit_tau_sq = nullptr)
protected:                                                           \
    virtual void on_iteration_complete(const LinOp *solver, const size_type &it,
            const LinOp *r, const LinOp *x, const LinOp *tau) const {
        this->on_iteration_complete(solver, it, r, x, tau, nullptr);
    }

stopping criterion generation: We could avoid the with_baseline factory parameter call if we had more speaking with_tolerance functions:

Factory& with_reduction_factor(double) { baseline = initial_resnorm; ... }
Factory& with_relative_residual(double) { baseline = rhs_norm; ... }
Factory& with_absolute_residual(double) { baseline = absolute; ... }

and adding another value none to the baseline enum. When we try to generate a residual norm on that enum value, we throw an exception: "Baseline not specified" or something like that.

include/ginkgo/core/base/types.hpp

core/matrix/dense.cpp

common/matrix/dense_kernels.hpp.inc

upsj · 2021-02-18T07:32:56Z

core/solver/bicgstab.cpp

@@ -169,6 +171,7 @@ void Bicgstab<ValueType>::apply_impl(const LinOp *b, LinOp *x) const
 stop_criterion->update()
 .num_iterations(iter)
 .residual(s.get())
+ .implicit_sq_residual_norm(rho.get())


Since we don't update rho between these two calls, does it make sense to provide the same value twice?

I need to always report this so that check_impl can capture it. If it is nullptr, then the check_impl for ImplicitResidualNorm will fail.

Do you see any downside to checking for nullptr in check_impl and, in that case, doing nothing? If the stopping status is not changed, the iteration should continue normally, right? Only then we can't stop at half-iterations with BiCGSTAB

core/base/array.cpp

core/stop/residual_norm.cpp

cuda/test/utils.hpp

reference/test/matrix/dense_kernels.cpp

pratikvn · 2021-02-18T09:00:15Z

@upsj , Regarding the logger, I guess @tcojean was also talking about iteration_complete. I completely missed that. I was thinking of criterion_check_x. I will update that.

Regarding the factory parameters, I prefer the with_baseline than separate parameters for the different baselines.

pratikvn · 2021-02-18T09:28:00Z

Actually, having another look at the iteration_complete and that it is a public function which is overridden in other loggers, adding parameters, actually breaks interface, I think.

upsj · 2021-02-18T09:40:42Z

@pratikvn That's what the second on_iteration_complete overload is for. Though I actually got it the wrong way round, this current implementation might cause iteration events to get lost on old loggers. The correct solution would be

    GKO_LOGGER_REGISTER_EVENT(21, iteration_complete, ...)
protected:                                                           \
    virtual void on_iteration_complete(const LinOp *solver, const size_type &it,
            const LinOp *r, const LinOp *x, const LinOp *tau, const LinOp *implicit_tau_sq = nullptr) const {
        this->on_iteration_complete(solver, it, r, x, tau);
    }

and overriding the new on_iteration_complete where necessary

pratikvn · 2021-02-18T09:55:41Z

@upsj , I think the problem is that the call of this->on_iteration_complete(solver, it, r, x, tau) inside the new on_iteration_complete is ambiguous because the compiler cannot resolve between the two when you call it with just non-specific parameters. Also I think any call to on_iteration_complete will be ambiguous unless all the parameters are specified.

upsj · 2021-02-18T09:59:16Z

@pratikvn Oh yeah, when you drop the default value for the last parameter, the ambiguity goes away. You only need to make sure when overriding the new interface that you override both functions, one with your actual implementation, and the other one with

void on_iteration_complete(const LinOp *solver, const size_type &it,
            const LinOp *r, const LinOp *x = nullptr, const LinOp *tau = nullptr) const {
        this->on_iteration_complete(solver, it, r, x, tau, nullptr);
    }

include/ginkgo/core/stop/residual_norm.hpp

pratikvn · 2021-02-18T10:59:33Z

Maybe I am still misunderstanding what you mean here.

Consider a logging call from the gmres apply. This essentially the log function, which calls the on_x based on the Event template parameter, therefore calling on_iteration_complete with the parameters passed into the log function.

The question is as for example in the call to log in gmres, where you pass in 4 parameters in addition to the solver, the on in the logger does not know which on_iteration_complete to call, because both the on_iteration_complete have the same parameters and have only one extra in the second overload.

Unless we specify all the parameters for the iteration_complete in the log call, this will be ambiguous. And adding parameters to all log calls will break interface, because in MFEM and deal.ii, we do derive and use from the previous loggers.

upsj · 2021-02-18T11:01:57Z

The new overload with the additional parameter must not have any default parameters, then there is no ambiguity between them. If you specify all parameter, the new overload is used, otherwise the old one is called.

pratikvn · 2021-02-18T12:01:48Z

Okay, I think I finally get you mean. :)

upsj

LGTM! I like the unification of the interface, probably something we can finish with 2.0
Only the two overloads of iteration_complete need to be swapped, otherwise old loggers can miss new events.

+ Simplify fill kernel. + Add implicit res to logger. + Some doc fixes. Co-authored-by: Tobias Ribizel <[email protected]>

sonarcloud · 2021-02-18T20:19:13Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
27 Code Smells

85.9% Coverage
3.7% Duplication

Ginkgo release 1.4.0 The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem which enables Intel-GPU and CPU execution. The only Ginkgo features which have not been ported yet are some preconditioners. Ginkgo's mixed-precision support is greatly enhanced thanks to: 1. The new Accessor concept, which allows writing kernels featuring on-the-fly memory compression, among other features. The accessor can be used as header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example. 2. All LinOps now transparently support mixed-precision execution. By default, this is done through a temporary copy which may have a performance impact but already allows mixed-precision research. Native mixed-precision ELL kernels are implemented which do not see this cost. The accessor is also leveraged in a new CB-GMRES solver which allows for performance improvements by compressing the Krylov basis vectors. Many other features have been added to Ginkgo, such as reordering support, a new IDR solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU for now), machine topology information, and more! Supported systems and requirements: + For all platforms, cmake 3.13+ + C++14 compliant compiler + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 3.5+ + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add a new DPC++ Executor for SYCL execution and other base utilities [#648](#648), [#661](#661), [#757](#757), [#832](#832) + Port matrix formats, solvers and related kernels to DPC++. For some kernels, also make use of a shared kernel implementation for all executors (except Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856) + Add accessors which allow multi-precision kernels, among other things. [#643](#643), [#708](#708) + Add support for mixed precision operations through apply in all LinOps. [#677](#677) + Add incomplete Cholesky factorizations and preconditioners as well as some improvements to ILU. [#672](#672), [#837](#837), [#846](#846) + Add an AMGX implementation and kernels on all devices but DPC++. [#528](#528), [#695](#695), [#860](#860) + Add a new mixed-precision capability solver, Compressed Basis GMRES (CB-GMRES). [#693](#693), [#763](#763) + Add the IDR(s) solver. [#620](#620) + Add a new fixed-size block CSR matrix format (for the Reference executor). [#671](#671), [#730](#730) + Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780) + Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649) + Add matrix assembly support on CPUs. [#644](#644) + Extends ISAI from triangular to general and spd matrices. [#690](#690) Other additions: + Add the possibility to apply real matrices to complex vectors. [#655](#655), [#658](#658) + Add functions to compute the absolute of a matrix format. [#636](#636) + Add symmetric permutation and improve existing permutations. [#684](#684), [#657](#657), [#663](#663) + Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697) + Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850) + Row-major accessor is generalized to more than 2 dimensions and a new "block column-major" accessor has been added. [#707](#707) + Add an heat equation example. [#698](#698), [#706](#706) + Add ccache support in CMake and CI. [#725](#725), [#739](#739) + Allow tuning and benchmarking variables non intrusively. [#692](#692) + Add triangular solver benchmark [#664](#664) + Add benchmarks for BLAS operations [#772](#772), [#829](#829) + Add support for different precisions and consistent index types in benchmarks. [#675](#675), [#828](#828) + Add a Github bot system to facilitate development and PR management. [#667](#667), [#674](#674), [#689](#689), [#853](#853) + Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781) + Add ssh debugging for Github Actions CI. [#749](#749) + Add pipeline segmentation for better CI speed. [#737](#737) Changes: + Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854) + Add implicit residual log for solvers and benchmarks. [#714](#714) + Change handling of the conjugate in the dense dot product. [#755](#755) + Improved Dense stride handling. [#774](#774) + Multiple improvements to the OpenMP kernels performance, including COO, an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740) + Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718) + Improved Identity constructor and treatment of rectangular matrices. [#646](#646) + Allow CUDA/HIP executors to select allocation mode. [#758](#758) + Check if executors share the same memory. [#670](#670) + Improve test install and smoke testing support. [#721](#721) + Update the JOSS paper citation and add publications in the documentation. [#629](#629), [#724](#724) + Improve the version output. [#806](#806) + Add some utilities for dim and span. [#821](#821) + Improved solver and preconditioner benchmarks. [#660](#660) + Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812) Fixes: + Sorting fix for the Jacobi preconditioner. [#659](#659) + Also log the first residual norm in CGS [#735](#735) + Fix BiCG and HIP CSR to work with complex matrices. [#651](#651) + Fix Coo SpMV on strided vectors. [#807](#807) + Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769) + Fix device_reset issue by moving counter/mutex to device. [#810](#810) + Fix `EnableLogging` superclass. [#841](#841) + Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726) + Decreased test size for a few device tests. [#742](#742) + Fix multiple issues with our CMake HIP and RPATH setup. [#712](#712), [#745](#745), [#709](#709) + Cleanup our CMake installation step. [#713](#713) + Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785) + Simplify third-party integration. [#786](#786) + Improve Ginkgo device arch flags management. [#696](#696) + Other fixes and improvements to the CMake setup. [#685](#685), [#792](#792), [#705](#705), [#836](#836) + Clarification of dense norm documentation [#784](#784) + Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840) + Make multiple operators/constructors explicit. [#650](#650), [#761](#761) + Fix some issues, memory leaks and warnings found by MSVC. [#666](#666), [#731](#731) + Improved solver memory estimates and consistent iteration counts [#691](#691) + Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754) + Fix for ForwardIterator requirements in iterator_factory. [#665](#665) + Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722) + Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852) Related PR: #857

Release 1.4.0 to master The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem which enables Intel-GPU and CPU execution. The only Ginkgo features which have not been ported yet are some preconditioners. Ginkgo's mixed-precision support is greatly enhanced thanks to: 1. The new Accessor concept, which allows writing kernels featuring on-the-fly memory compression, among other features. The accessor can be used as header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example. 2. All LinOps now transparently support mixed-precision execution. By default, this is done through a temporary copy which may have a performance impact but already allows mixed-precision research. Native mixed-precision ELL kernels are implemented which do not see this cost. The accessor is also leveraged in a new CB-GMRES solver which allows for performance improvements by compressing the Krylov basis vectors. Many other features have been added to Ginkgo, such as reordering support, a new IDR solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU for now), machine topology information, and more! Supported systems and requirements: + For all platforms, cmake 3.13+ + C++14 compliant compiler + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 3.5+ + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add a new DPC++ Executor for SYCL execution and other base utilities [#648](#648), [#661](#661), [#757](#757), [#832](#832) + Port matrix formats, solvers and related kernels to DPC++. For some kernels, also make use of a shared kernel implementation for all executors (except Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856) + Add accessors which allow multi-precision kernels, among other things. [#643](#643), [#708](#708) + Add support for mixed precision operations through apply in all LinOps. [#677](#677) + Add incomplete Cholesky factorizations and preconditioners as well as some improvements to ILU. [#672](#672), [#837](#837), [#846](#846) + Add an AMGX implementation and kernels on all devices but DPC++. [#528](#528), [#695](#695), [#860](#860) + Add a new mixed-precision capability solver, Compressed Basis GMRES (CB-GMRES). [#693](#693), [#763](#763) + Add the IDR(s) solver. [#620](#620) + Add a new fixed-size block CSR matrix format (for the Reference executor). [#671](#671), [#730](#730) + Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780) + Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649) + Add matrix assembly support on CPUs. [#644](#644) + Extends ISAI from triangular to general and spd matrices. [#690](#690) Other additions: + Add the possibility to apply real matrices to complex vectors. [#655](#655), [#658](#658) + Add functions to compute the absolute of a matrix format. [#636](#636) + Add symmetric permutation and improve existing permutations. [#684](#684), [#657](#657), [#663](#663) + Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697) + Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850) + Row-major accessor is generalized to more than 2 dimensions and a new "block column-major" accessor has been added. [#707](#707) + Add an heat equation example. [#698](#698), [#706](#706) + Add ccache support in CMake and CI. [#725](#725), [#739](#739) + Allow tuning and benchmarking variables non intrusively. [#692](#692) + Add triangular solver benchmark [#664](#664) + Add benchmarks for BLAS operations [#772](#772), [#829](#829) + Add support for different precisions and consistent index types in benchmarks. [#675](#675), [#828](#828) + Add a Github bot system to facilitate development and PR management. [#667](#667), [#674](#674), [#689](#689), [#853](#853) + Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781) + Add ssh debugging for Github Actions CI. [#749](#749) + Add pipeline segmentation for better CI speed. [#737](#737) Changes: + Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854) + Add implicit residual log for solvers and benchmarks. [#714](#714) + Change handling of the conjugate in the dense dot product. [#755](#755) + Improved Dense stride handling. [#774](#774) + Multiple improvements to the OpenMP kernels performance, including COO, an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740) + Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718) + Improved Identity constructor and treatment of rectangular matrices. [#646](#646) + Allow CUDA/HIP executors to select allocation mode. [#758](#758) + Check if executors share the same memory. [#670](#670) + Improve test install and smoke testing support. [#721](#721) + Update the JOSS paper citation and add publications in the documentation. [#629](#629), [#724](#724) + Improve the version output. [#806](#806) + Add some utilities for dim and span. [#821](#821) + Improved solver and preconditioner benchmarks. [#660](#660) + Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812) Fixes: + Sorting fix for the Jacobi preconditioner. [#659](#659) + Also log the first residual norm in CGS [#735](#735) + Fix BiCG and HIP CSR to work with complex matrices. [#651](#651) + Fix Coo SpMV on strided vectors. [#807](#807) + Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769) + Fix device_reset issue by moving counter/mutex to device. [#810](#810) + Fix `EnableLogging` superclass. [#841](#841) + Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726) + Decreased test size for a few device tests. [#742](#742) + Fix multiple issues with our CMake HIP and RPATH setup. [#712](#712), [#745](#745), [#709](#709) + Cleanup our CMake installation step. [#713](#713) + Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785) + Simplify third-party integration. [#786](#786) + Improve Ginkgo device arch flags management. [#696](#696) + Other fixes and improvements to the CMake setup. [#685](#685), [#792](#792), [#705](#705), [#836](#836) + Clarification of dense norm documentation [#784](#784) + Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840) + Make multiple operators/constructors explicit. [#650](#650), [#761](#761) + Fix some issues, memory leaks and warnings found by MSVC. [#666](#666), [#731](#731) + Improved solver memory estimates and consistent iteration counts [#691](#691) + Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754) + Fix for ForwardIterator requirements in iterator_factory. [#665](#665) + Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722) + Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852) Related PR: #866

pratikvn added 3 commits February 6, 2021 16:03

Add an implicit resnorm stopping criterion

1079d45

+ Saves on computing a 2 norm in each iteration in the stopping criterion. + Also makes the results comparable with other libraries that use this type of implicit residual for convergence checks.

Add core and kernel tests.

b0f14bf

Update stop criterion for solvers.

1be1ec2

+ Add criterion tests for solvers.

pratikvn added is:new-feature A request or implementation of a feature that does not exist yet. 1:ST:ready-for-review This PR is ready for review type:stopping-criteria This is related to the stopping criteria mod:all This touches all Ginkgo modules. labels Feb 6, 2021

pratikvn self-assigned this Feb 6, 2021

ginkgo-bot added reg:testing This is related to testing. type:solver This is related to the solvers labels Feb 6, 2021

pratikvn requested review from tcojean, fritzgoebel, yhmtsai, hartwiganzt, nbeams, Slaedr, thoasm and upsj February 6, 2021 15:28

hartwiganzt approved these changes Feb 8, 2021

View reviewed changes

Slaedr approved these changes Feb 8, 2021

View reviewed changes

core/solver/idr.cpp Show resolved Hide resolved

hip/stop/residual_norm_kernels.hip.cpp Show resolved Hide resolved

reference/test/solver/bicg_kernels.cpp Outdated Show resolved Hide resolved

reference/test/stop/residual_norm_kernels.cpp Outdated Show resolved Hide resolved

nbeams reviewed Feb 8, 2021

View reviewed changes

include/ginkgo/core/stop/residual_norm.hpp Outdated Show resolved Hide resolved

Review update.

04f0c74

+ Add a parameter relative to which the implicit residual is computed. + Update docs. Co-authored-by: Natalie Beams <[email protected]>

thoasm reviewed Feb 9, 2021

View reviewed changes

upsj previously requested changes Feb 9, 2021

View reviewed changes

Add strided fill kernels for dense.

460b7ad

thoasm approved these changes Feb 17, 2021

View reviewed changes

pratikvn requested a review from upsj February 18, 2021 07:10

upsj reviewed Feb 18, 2021

View reviewed changes

include/ginkgo/core/stop/residual_norm.hpp Outdated Show resolved Hide resolved

upsj approved these changes Feb 18, 2021

View reviewed changes

pratikvn force-pushed the recur-res-stop-crit branch from 9004179 to 4a577e4 Compare February 18, 2021 12:43

Review udpate.

2f74910

+ Simplify fill kernel. + Add implicit res to logger. + Some doc fixes. Co-authored-by: Tobias Ribizel <[email protected]>

pratikvn force-pushed the recur-res-stop-crit branch from 4a577e4 to 2f74910 Compare February 18, 2021 12:51

pratikvn added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Feb 18, 2021

pratikvn merged commit 555b8b2 into develop Feb 18, 2021

pratikvn deleted the recur-res-stop-crit branch February 18, 2021 20:57

upsj mentioned this pull request Mar 3, 2021

Add implicit residual log to solvers and benchmarks #714

Merged

upsj mentioned this pull request May 5, 2021

Adding Functionality needed by openCARP #555

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an implicit residual norm criterion. #702

Add an implicit residual norm criterion. #702

pratikvn commented Feb 6, 2021 •

edited

Loading

codecov bot commented Feb 6, 2021 •

edited

Loading

hartwiganzt commented Feb 7, 2021

pratikvn commented Feb 8, 2021

upsj commented Feb 8, 2021 •

edited

Loading

hartwiganzt left a comment

Slaedr left a comment

Slaedr commented Feb 8, 2021

nbeams left a comment

thoasm left a comment

upsj left a comment

pratikvn commented Feb 9, 2021

pratikvn commented Feb 17, 2021

pratikvn commented Feb 17, 2021

thoasm left a comment

tcojean commented Feb 17, 2021

upsj left a comment

upsj Feb 18, 2021

pratikvn Feb 18, 2021

pratikvn Feb 18, 2021

upsj Feb 18, 2021

pratikvn commented Feb 18, 2021

pratikvn commented Feb 18, 2021

upsj commented Feb 18, 2021

pratikvn commented Feb 18, 2021

upsj commented Feb 18, 2021

pratikvn commented Feb 18, 2021

upsj commented Feb 18, 2021

pratikvn commented Feb 18, 2021

upsj left a comment

sonarcloud bot commented Feb 18, 2021

Add an implicit residual norm criterion. #702

Add an implicit residual norm criterion. #702

Conversation

pratikvn commented Feb 6, 2021 • edited Loading

Edited

codecov bot commented Feb 6, 2021 • edited Loading

Codecov Report

hartwiganzt commented Feb 7, 2021

pratikvn commented Feb 8, 2021

upsj commented Feb 8, 2021 • edited Loading

hartwiganzt left a comment

Choose a reason for hiding this comment

Slaedr left a comment

Choose a reason for hiding this comment

Slaedr commented Feb 8, 2021

nbeams left a comment

Choose a reason for hiding this comment

thoasm left a comment

Choose a reason for hiding this comment

upsj left a comment

Choose a reason for hiding this comment

pratikvn commented Feb 9, 2021

pratikvn commented Feb 17, 2021

pratikvn commented Feb 17, 2021

thoasm left a comment

Choose a reason for hiding this comment

tcojean commented Feb 17, 2021

upsj left a comment

Choose a reason for hiding this comment

upsj Feb 18, 2021

Choose a reason for hiding this comment

pratikvn Feb 18, 2021

Choose a reason for hiding this comment

pratikvn Feb 18, 2021

Choose a reason for hiding this comment

upsj Feb 18, 2021

Choose a reason for hiding this comment

pratikvn commented Feb 18, 2021

pratikvn commented Feb 18, 2021

upsj commented Feb 18, 2021

pratikvn commented Feb 18, 2021

upsj commented Feb 18, 2021

pratikvn commented Feb 18, 2021

upsj commented Feb 18, 2021

pratikvn commented Feb 18, 2021

upsj left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Feb 18, 2021

pratikvn commented Feb 6, 2021 •

edited

Loading

codecov bot commented Feb 6, 2021 •

edited

Loading

upsj commented Feb 8, 2021 •

edited

Loading