Add a batch::Bicgstab solver class, core, ref and omp kernels #1438

pratikvn · 2023-10-21T19:33:15Z

This PR adds a batch::Bicgstab solver and only the reference kernels for now. Another PR will be created to add the cuda, hip and dpcpp kernels to avoid making this PR too large.

In addition, some general solver, stopping critieria, logger and preconditioner framework is also added. These are fairly simple and I think it helps review these in the context of the solver itself.

Batch stopping criteria
Simple batch logger
Some batch matrix generation utilities
A basic BatchIdentity matrix class and a corresponding Identity preconditioner to enable unpreconditioned solves.
The batch dispatch mechanism that selects the correct matrix, solver, preconditioner, stopping critieria at runtime and dispatches the correct kernel on the device.

MarcelKoch

I think we can use our unified kernels approach for some of these parts. In particular, the logger and stopping criteria don't use any backend specific stuff, except for some function attributes. Those could also be handled uniformly through macros, which we already have.

I think even the identity preconditioner could be handled this way, although that would require some adjustments to our unified kernels, so I think we should postpone that.

yhmtsai

first part of my review

yhmtsai · 2023-10-23T08:46:15Z

common/cuda_hip/log/batch_logger.hpp.inc

+
+
+/**
+ * Logs the final residual and iteration count for a batch solver.


Suggested change

* Logs the final residual and iteration count for a batch solver.

* Logs the final actual residual norm and iteration count for a batch solver.

It is for actual residual not implicit residual, right?

That depends on the solver, so I would not specify that here.

Is it also applied to criterion?
If it is, it gives unexpected convergence behavior. User sometimes gets the residual indeed less the requirement (actual residual) but sometimes get higher residual as converged result because it depends on the implicit one

Yes, criterion checks are also always with whatever residual the solver provides.

Maybe I should clarify that we always check against the implicit residual within the solvers. In some cases, the implicit residual and the actual residual may be the same, but that depends on the solver.

common/cuda_hip/log/batch_logger.hpp.inc

common/cuda_hip/preconditioner/batch_identity.hpp.inc

common/cuda_hip/stop/batch_criteria.hpp.inc

include/ginkgo/core/matrix/batch_identity.hpp

include/ginkgo/core/solver/batch_bicgstab.hpp

reference/preconditioner/batch_identity.hpp

reference/stop/batch_criteria.hpp

MarcelKoch

I think the code can use some of the new core developments. For example, the factory parameter can be unified, or maybe the workspace can be extended to also cover the batched case. But some of those changes (e.g. the workspace) could be done at a later time. So for now I'm focusing on the interface to allow for these changes.
Part 1/n

include/ginkgo/core/solver/batch_solver_base.hpp

include/ginkgo/core/log/batch_logger.hpp

include/ginkgo/core/solver/batch_solver_base.hpp

MarcelKoch

Part 2/n, mostly done with the interface and core stuff (except the test helpers). I think especially on the logger side there are some inconsistencies that I would like to see addressed.

include/ginkgo/core/matrix/batch_dense.hpp

core/matrix/batch_struct.hpp

include/ginkgo/core/log/batch_logger.hpp

core/solver/batch_dispatch.hpp

core/matrix/batch_struct.hpp

include/ginkgo/core/solver/batch_solver_base.hpp

core/solver/batch_dispatch.hpp

core/test/solver/batch_bicgstab.cpp

yhmtsai

second part

core/solver/batch_dispatch.hpp

dpcpp/log/batch_logger.hpp

include/ginkgo/core/log/logger.hpp

reference/preconditioner/batch_identity.hpp

yhmtsai · 2023-10-24T09:25:31Z

reference/preconditioner/batch_identity.hpp

+ * Sets the input and generates the identity preconditioner.(Nothing needs
+ * to be actually generated.)
+ */
+ void generate(size_type,


does batch_identity need to be preconditioner?
batch_identity will be passed through the generated_preconditioner or the default preconditioner, right?

Essentially, the solver will always have prec.generate(...) and prec_apply(...) calls. As it is templated, in the default case, we need to have the identity preconditioner.

yhmtsai · 2023-10-24T11:21:43Z

reference/solver/batch_bicgstab_kernels.hpp.inc

+ initialize(A_entry, b_entry, gko::batch::to_const(x_entry), rho_old_entry,
+ omega_entry, alpha_entry, r_entry, r_hat_entry, p_entry,
+ p_hat_entry, v_entry, rhs_norms_entry, res_norms_entry);


the function call is slightly different from the core/solver/bicgstab. Is there any benefit merge b-Ax and r_hat = r to initialize? keeping them similar to core might be easier for reviewing

I draw back my comment because the other kernel can put the dot together unlike the core already

reference/solver/batch_bicgstab_kernels.hpp.inc

yhmtsai · 2023-10-24T11:33:56Z

reference/solver/batch_bicgstab_kernels.hpp.inc

+
+template <typename StopType, typename PrecType, typename LogType,
+ typename BatchMatrixType, typename ValueType>
+inline void batch_entry_bicgstab_impl(


I also think the core part can be shared among backends, but I do not focus on that now.
I assume the fused kernel from GPU perspective

Yes, I think we can think about unifying this later.

omp/solver/batch_bicgstab_kernels.cpp

MarcelKoch

Part 3/3. This concerns mostly the reference/omp kernel and tests. There are only few notes on the kernels (beside moving parts into common/unified). I think there are some easy generalizations in the test helpers possible.

core/test/utils/batch_helpers.hpp

MarcelKoch · 2023-10-24T12:46:47Z

reference/test/solver/batch_bicgstab_kernels.cpp

+ for (size_t i = 0; i < this->num_batch_items; i++) {
+ ASSERT_LE(res_log_array[i] / this->linear_system.rhs_norm->at(i, 0, 0),
+ this->solver_settings.residual_tol);
+ ASSERT_NEAR(res_log_array[i], res.res_norm->get_const_values()[i],


I'm not sure that this is a helpful test. IMO it would be better to compare the solver result to the true solution, or just leave it out. The test above might already be sufficient.

also, it should be equal not near, I think?

reference/stop/batch_criteria.hpp

reference/test/solver/batch_bicgstab_kernels.cpp

omp/solver/batch_bicgstab_kernels.cpp

include/ginkgo/core/log/batch_logger.hpp

include/ginkgo/core/solver/batch_bicgstab.hpp

yhmtsai · 2023-10-25T09:40:19Z

test/solver/batch_bicgstab_kernels.cpp

+ auto iter_array = res.log_data->iter_counts.get_const_data();
+ for (size_t i = 0; i < num_batch_items; i++) {
+ ASSERT_EQ(iter_array[i], ref_iters);
+ }


does it make the linear system unsolved? otherwise, it might be less than ref_iters

Yes, the tolerance of 0 is not acheivable and it should always hit the ref iters

using nan is maybe more general, which also fit if we decide to use <= not <

Will that work on device as well ?

Yes, I think so. It should work if the compiler does not use fast math.

In this case, it is still not possible be acheive a tolerance of 0, so i think nan is not necessary.

yhmtsai · 2023-10-25T09:42:26Z

test/solver/batch_bicgstab_kernels.cpp

+ auto comp_res_norm =
+ exec->copy_val_to_host(res.res_norm->get_const_values() + i);
+ ASSERT_LE(iter_counts->get_const_data()[i], max_iters);
+ EXPECT_LE(res_norm->get_const_data()[i], comp_tol);


why does this criterion need use 100 * tol not tol if the criterion is absolute residual norm?

I think there were issues only on some systems, particularly MSVC. Not sure why.

It's might related to the optimization or different random input?
The codes gives me the confusion about the criterion.
From my first thought, it is actual residual norm check. That's why I do not think that the residual norm does not match the required criterion makes sense.

I think this code is a bit stale and has been updated. So, I think it should be correct now. In the updated code, comp_res_norm is the actual residual while resnorm is the residual from the logger, which in this case is the implicit residual.

yhmtsai · 2023-10-25T09:44:03Z

reference/test/solver/batch_bicgstab_kernels.cpp

+ for (size_t i = 0; i < this->num_batch_items; i++) {
+ ASSERT_LE(res_log_array[i] / this->linear_system.rhs_norm->at(i, 0, 0),
+ this->solver_settings.residual_tol);
+ ASSERT_NEAR(res_log_array[i], res.res_norm->get_const_values()[i],


also, it should be equal not near, I think?

yhmtsai · 2023-10-25T09:46:28Z

reference/test/solver/batch_bicgstab_kernels.cpp

+ EXPECT_LE(rel_res_norm, res_norm.get_const_data()[i]);
+ ASSERT_LE(rel_res_norm, tol * 10);


Suggested change

EXPECT_LE(rel_res_norm, res_norm.get_const_data()[i]);

ASSERT_LE(rel_res_norm, tol * 10);

EXPECT_EQ(rel_res_norm, res_norm.get_const_data()[i]);

ASSERT_LE(rel_res_norm, tol);

yhmtsai · 2023-10-25T09:46:55Z

reference/test/solver/batch_bicgstab_kernels.cpp

+
+ GKO_ASSERT_BATCH_MTX_NEAR(res.x, linear_system.exact_sol, tol * 50);
+ for (size_t i = 0; i < num_batch_items; i++) {
+ ASSERT_LE(res.res_norm->get_const_values()[i], tol * 50);


Suggested change

ASSERT_LE(res.res_norm->get_const_values()[i], tol * 50);

ASSERT_LE(res.res_norm->get_const_values()[i], tol);

Both MSVC and NVHPC seem to have issues with even 50.

MarcelKoch · 2023-10-25T14:15:02Z

@pratikvn Do you mind holding off on the rebasing until all reviews are done (unless necessary)? Github can't keep track of the new changes otherwise (and VS Code seems also unable to do so).

pratikvn · 2023-10-25T22:56:14Z

@yhmtsai , the issue of tolerance is the same we have had in other places. Some compilers always seem to need higher values for tolerances, so the values of 50, 10 and 100 have been set empirically.

Co-authored-by: Yu-Hsiang Tsai <[email protected]> Co-authored-by: Marcel Koch <[email protected]>

Co-authored-by: Yu-Hsiang Tsai <[email protected]>

Co-authored-by: Marcel Koch <[email protected]>

Co-authored-by: Pratik Nayak <[email protected]>

Co-authored-by: Yu-Hsian Tsai <[email protected]>

Co-authored-by: Marcel Koch <[email protected]> Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Yu-Hsiang Tsai <[email protected]>

Co-authored-by: Yu-Hsiang Tsai <[email protected]>

pratikvn · 2023-11-01T09:03:48Z

As the discussion of the experimental namespace is independent of this PR and this PR has been reviewed, I will go ahead and merge this now to simplify the other batch PR as our CI seems to be stuck.

Release 1.7.0 to master The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as: - Complete GPU-resident sparse direct solvers feature set and interfaces, - Improved Cholesky factorization performance, - A new MC64 reordering, - Batched iterative solver support with the BiCGSTAB solver with batched Dense and ELL matrix types, - MPI support for the SYCL backend, - Improved ParILU(T)/ParIC(T) preconditioner convergence, and more! If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.16+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2019+ + Apple Clang: 14.0 is tested. Earlier versions might also work. + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+ + HIP module: ROCm 4.5+ + DPC++ module: Intel oneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. + MPI: standard version 3.1+, ideally GPU Aware, for best performance + Windows + MinGW: GCC 5.5+ + Microsoft Visual Studio: VS 2019+ + CUDA module: CUDA 10.1+, Microsoft Visual Studio + OpenMP module: MinGW. ### Version support changes + CUDA 9.2 is no longer supported and 10.0 is untested [#1382](#1382) + Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) [#1368](#1368) ### Interface changes + `const` Factory parameters can no longer be modified through `with_*` functions, as this breaks const-correctness [#1336](#1336) [#1439](#1439) ### New Deprecations + The `device_reset` parameter of CUDA and HIP executors no longer has an effect, and its `allocation_mode` parameters have been deprecated in favor of the `Allocator` interface. [#1315](#1315) + The CMake parameter `GINKGO_BUILD_DPCPP` has been deprecated in favor of `GINKGO_BUILD_SYCL`. [#1350](#1350) + The `gko::reorder::Rcm` interface has been deprecated in favor of `gko::experimental::reorder::Rcm` based on `Permutation`. [#1418](#1418) + The Permutation class' `permute_mask` functionality. [#1415](#1415) + Multiple functions with typos (`set_complex_subpsace()`, range functions such as `conj_operaton` etc). [#1348](#1348) ### Summary of previous deprecations + `gko::lend()` is not necessary anymore. + The classes `RelativeResidualNorm` and `AbsoluteResidualNorm` are deprecated in favor of `ResidualNorm`. + The class `AmgxPgm` is deprecated in favor of `Pgm`. + Default constructors for the CSR `load_balance` and `automatical` strategies + The PolymorphicObject's move-semantic `copy_from` variant + The templated `SolverBase` class. + The class `MachineTopology` is deprecated in favor of `machine_topology`. + Logger constructors and create functions with the `executor` parameter. + The virtual, protected, Dense functions `compute_norm1_impl`, `add_scaled_impl`, etc. + Logger events for solvers and criterion without the additional `implicit_tau_sq` parameter. + The global `gko::solver::default_krylov_dim`, use instead `gko::solver::gmres_default_krylov_dim`. ### Added features + Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners [#1379](#1379) + Adds a batch::MultiVector class that enables operations such as dot, norm, scale on batched vectors [#1371](#1371) + Adds a batch::Dense matrix format that stores batched dense matrices and provides gemv operations for these dense matrices. [#1413](#1413) + Adds a batch::Ell matrix format that stores batched Ell matrices and provides spmv operations for these batched Ell matrices. [#1416](#1416) [#1437](#1437) + Add a batch::Bicgstab solver (class, core, and reference kernels) that enables iterative solution of batched linear systems [#1438](#1438). + Add device kernels (CUDA, HIP, and DPCPP) for batch::Bicgstab solver. [#1443](#1443). + New MC64 reordering algorithm which optimizes the diagonal product or sum of a matrix by permuting the rows, and computes additional scaling factors for equilibriation [#1120](#1120) + New interface for (non-symmetric) permutation and scaled permutation of Dense and Csr matrices [#1415](#1415) + LU and Cholesky Factorizations can now be separated into their factors [#1432](#1432) + New symbolic LU factorization algorithm that is optimized for matrices with an almost-symmetric sparsity pattern [#1445](#1445) + Sorting kernels for SparsityCsr on all backends [#1343](#1343) + Allow passing pre-generated local solver as factory parameter for the distributed Schwarz preconditioner [#1426](#1426) + Add DPCPP kernels for Partition [#1034](#1034), and CSR's `check_diagonal_entries` and `add_scaled_identity` functionality [#1436](#1436) + Adds a helper function to create a partition based on either local sizes, or local ranges [#1227](#1227) + Add function to compute arithmetic mean of dense and distributed vectors [#1275](#1275) + Adds `icpx` compiler supports [#1350](#1350) + All backends can be built simultaneously [#1333](#1333) + Emits a CMake warning in downstream projects that use different compilers than the installed Ginkgo [#1372](#1372) + Reordering algorithms in sparse_blas benchmark [#1354](#1354) + Benchmarks gained an `-allocator` parameter to specify device allocators [#1385](#1385) + Benchmarks gained an `-input_matrix` parameter that initializes the input JSON based on the filename [#1387](#1387) + Benchmark inputs can now be reordered as a preprocessing step [#1408](#1408) ### Improvements + Significantly improve Cholesky factorization performance [#1366](#1366) + Improve parallel build performance [#1378](#1378) + Allow constrained parallel test execution using CTest resources [#1373](#1373) + Use arithmetic type more inside mixed precision ELL [#1414](#1414) + Most factory parameters of factory type no longer need to be constructed explicitly via `.on(exec)` [#1336](#1336) [#1439](#1439) + Improve ParILU(T)/ParIC(T) convergence by using more appropriate atomic operations [#1434](#1434) ### Fixes + Fix an over-allocation for OpenMP reductions [#1369](#1369) + Fix DPCPP's common-kernel reduction for empty input sizes [#1362](#1362) + Fix several typos in the API and documentation [#1348](#1348) + Fix inconsistent `Threads` between generations [#1388](#1388) + Fix benchmark median condition [#1398](#1398) + Fix HIP 5.6.0 compilation [#1411](#1411) + Fix missing destruction of rand_generator from cuda/hip [#1417](#1417) + Fix PAPI logger destruction order [#1419](#1419) + Fix TAU logger compilation [#1422](#1422) + Fix relative criterion to not iterate if the residual is already zero [#1079](#1079) + Fix memory_order invocations with C++20 changes [#1402](#1402) + Fix `check_diagonal_entries_exist` report correctly when only missing diagonal value in the last rows. [#1440](#1440) + Fix checking OpenMPI version in cross-compilation settings [#1446](#1446) + Fix false-positive deprecation warnings in Ginkgo, especially for the old Rcm (it doesn't emit deprecation warnings anymore as a result but is still considered deprecated) [#1444](#1444) ### Related PR: #1451

Release 1.7.0 to develop The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as: - Complete GPU-resident sparse direct solvers feature set and interfaces, - Improved Cholesky factorization performance, - A new MC64 reordering, - Batched iterative solver support with the BiCGSTAB solver with batched Dense and ELL matrix types, - MPI support for the SYCL backend, - Improved ParILU(T)/ParIC(T) preconditioner convergence, and more! If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.16+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2019+ + Apple Clang: 14.0 is tested. Earlier versions might also work. + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+ + HIP module: ROCm 4.5+ + DPC++ module: Intel oneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. + MPI: standard version 3.1+, ideally GPU Aware, for best performance + Windows + MinGW: GCC 5.5+ + Microsoft Visual Studio: VS 2019+ + CUDA module: CUDA 10.1+, Microsoft Visual Studio + OpenMP module: MinGW. ### Version support changes + CUDA 9.2 is no longer supported and 10.0 is untested [#1382](#1382) + Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) [#1368](#1368) ### Interface changes + `const` Factory parameters can no longer be modified through `with_*` functions, as this breaks const-correctness [#1336](#1336) [#1439](#1439) ### New Deprecations + The `device_reset` parameter of CUDA and HIP executors no longer has an effect, and its `allocation_mode` parameters have been deprecated in favor of the `Allocator` interface. [#1315](#1315) + The CMake parameter `GINKGO_BUILD_DPCPP` has been deprecated in favor of `GINKGO_BUILD_SYCL`. [#1350](#1350) + The `gko::reorder::Rcm` interface has been deprecated in favor of `gko::experimental::reorder::Rcm` based on `Permutation`. [#1418](#1418) + The Permutation class' `permute_mask` functionality. [#1415](#1415) + Multiple functions with typos (`set_complex_subpsace()`, range functions such as `conj_operaton` etc). [#1348](#1348) ### Summary of previous deprecations + `gko::lend()` is not necessary anymore. + The classes `RelativeResidualNorm` and `AbsoluteResidualNorm` are deprecated in favor of `ResidualNorm`. + The class `AmgxPgm` is deprecated in favor of `Pgm`. + Default constructors for the CSR `load_balance` and `automatical` strategies + The PolymorphicObject's move-semantic `copy_from` variant + The templated `SolverBase` class. + The class `MachineTopology` is deprecated in favor of `machine_topology`. + Logger constructors and create functions with the `executor` parameter. + The virtual, protected, Dense functions `compute_norm1_impl`, `add_scaled_impl`, etc. + Logger events for solvers and criterion without the additional `implicit_tau_sq` parameter. + The global `gko::solver::default_krylov_dim`, use instead `gko::solver::gmres_default_krylov_dim`. ### Added features + Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners [#1379](#1379) + Adds a batch::MultiVector class that enables operations such as dot, norm, scale on batched vectors [#1371](#1371) + Adds a batch::Dense matrix format that stores batched dense matrices and provides gemv operations for these dense matrices. [#1413](#1413) + Adds a batch::Ell matrix format that stores batched Ell matrices and provides spmv operations for these batched Ell matrices. [#1416](#1416) [#1437](#1437) + Add a batch::Bicgstab solver (class, core, and reference kernels) that enables iterative solution of batched linear systems [#1438](#1438). + Add device kernels (CUDA, HIP, and DPCPP) for batch::Bicgstab solver. [#1443](#1443). + New MC64 reordering algorithm which optimizes the diagonal product or sum of a matrix by permuting the rows, and computes additional scaling factors for equilibriation [#1120](#1120) + New interface for (non-symmetric) permutation and scaled permutation of Dense and Csr matrices [#1415](#1415) + LU and Cholesky Factorizations can now be separated into their factors [#1432](#1432) + New symbolic LU factorization algorithm that is optimized for matrices with an almost-symmetric sparsity pattern [#1445](#1445) + Sorting kernels for SparsityCsr on all backends [#1343](#1343) + Allow passing pre-generated local solver as factory parameter for the distributed Schwarz preconditioner [#1426](#1426) + Add DPCPP kernels for Partition [#1034](#1034), and CSR's `check_diagonal_entries` and `add_scaled_identity` functionality [#1436](#1436) + Adds a helper function to create a partition based on either local sizes, or local ranges [#1227](#1227) + Add function to compute arithmetic mean of dense and distributed vectors [#1275](#1275) + Adds `icpx` compiler supports [#1350](#1350) + All backends can be built simultaneously [#1333](#1333) + Emits a CMake warning in downstream projects that use different compilers than the installed Ginkgo [#1372](#1372) + Reordering algorithms in sparse_blas benchmark [#1354](#1354) + Benchmarks gained an `-allocator` parameter to specify device allocators [#1385](#1385) + Benchmarks gained an `-input_matrix` parameter that initializes the input JSON based on the filename [#1387](#1387) + Benchmark inputs can now be reordered as a preprocessing step [#1408](#1408) ### Improvements + Significantly improve Cholesky factorization performance [#1366](#1366) + Improve parallel build performance [#1378](#1378) + Allow constrained parallel test execution using CTest resources [#1373](#1373) + Use arithmetic type more inside mixed precision ELL [#1414](#1414) + Most factory parameters of factory type no longer need to be constructed explicitly via `.on(exec)` [#1336](#1336) [#1439](#1439) + Improve ParILU(T)/ParIC(T) convergence by using more appropriate atomic operations [#1434](#1434) ### Fixes + Fix an over-allocation for OpenMP reductions [#1369](#1369) + Fix DPCPP's common-kernel reduction for empty input sizes [#1362](#1362) + Fix several typos in the API and documentation [#1348](#1348) + Fix inconsistent `Threads` between generations [#1388](#1388) + Fix benchmark median condition [#1398](#1398) + Fix HIP 5.6.0 compilation [#1411](#1411) + Fix missing destruction of rand_generator from cuda/hip [#1417](#1417) + Fix PAPI logger destruction order [#1419](#1419) + Fix TAU logger compilation [#1422](#1422) + Fix relative criterion to not iterate if the residual is already zero [#1079](#1079) + Fix memory_order invocations with C++20 changes [#1402](#1402) + Fix `check_diagonal_entries_exist` report correctly when only missing diagonal value in the last rows. [#1440](#1440) + Fix checking OpenMPI version in cross-compilation settings [#1446](#1446) + Fix false-positive deprecation warnings in Ginkgo, especially for the old Rcm (it doesn't emit deprecation warnings anymore as a result but is still considered deprecated) [#1444](#1444) ### Related PR: #1454

pratikvn added 1:ST:WIP This PR is a work in progress. Not ready for review. type:batched-functionality This is related to the batched functionality in Ginkgo labels Oct 21, 2023

pratikvn added this to the Release 1.7.0 milestone Oct 21, 2023

pratikvn self-assigned this Oct 21, 2023

pratikvn force-pushed the batch-bicgstab branch 2 times, most recently from 25a894a to 26472b9 Compare October 23, 2023 05:36

MarcelKoch reviewed Oct 23, 2023

View reviewed changes

MarcelKoch self-requested a review October 23, 2023 09:13

yhmtsai reviewed Oct 23, 2023

View reviewed changes

MarcelKoch reviewed Oct 23, 2023

View reviewed changes

MarcelKoch self-requested a review October 24, 2023 08:04

MarcelKoch reviewed Oct 24, 2023

View reviewed changes

yhmtsai reviewed Oct 24, 2023

View reviewed changes

MarcelKoch reviewed Oct 24, 2023

View reviewed changes

pratikvn force-pushed the batch-bicgstab branch from 2611c7a to 5e282b5 Compare October 25, 2023 09:03

yhmtsai reviewed Oct 25, 2023

View reviewed changes

pratikvn force-pushed the batch-bicgstab branch 2 times, most recently from 82712a3 to e17e58d Compare October 25, 2023 22:54

pratikvn added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Oct 25, 2023

pratikvn and others added 22 commits October 31, 2023 23:46

Add omp tests and gen improvements

201b3e0

Fix logger and update docs

9262bbf

re-template logger and logdata

8d55033

doc improvements and some restructuring

fb0b856

formatting and renames

d58a997

generic logdata improvements

6f61cd0

rename kernel namespaces

83b1fde

use workspace for logger

1fc68ce

use new factory setup, move crit to base

4f67841

Add batch identity test and fix apply

20419af

Review updates

a89b4af

Co-authored-by: Yu-Hsiang Tsai <[email protected]> Co-authored-by: Marcel Koch <[email protected]>

s/BicgstabSettings/settings

d5a55ad

Fix workspace issues and review updates

f6ae1a4

Co-authored-by: Yu-Hsiang Tsai <[email protected]>

Review updates

8f920ea

Co-authored-by: Marcel Koch <[email protected]>

rename crit getters and setters

f7bcbea

Format files

9b831a3

Co-authored-by: Pratik Nayak <[email protected]>

Update copy/move semantics

0a6e700

Review updates

f4f69ba

Co-authored-by: Yu-Hsian Tsai <[email protected]>

Review updates

9c1e139

Co-authored-by: Marcel Koch <[email protected]> Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Yu-Hsiang Tsai <[email protected]>

Fix cuda incom type and check defaults

b87f213

clarify implicit/actual res norm docs, MSVC fixes

adb8f97

review updates

2260c8f

Co-authored-by: Yu-Hsiang Tsai <[email protected]>

pratikvn force-pushed the batch-bicgstab branch from e21b275 to 2260c8f Compare October 31, 2023 22:47

pratikvn merged commit 3d8dc38 into develop Nov 1, 2023
10 of 15 checks passed

Batched Ginkgo automation moved this from In progress to Completed Nov 1, 2023

pratikvn deleted the batch-bicgstab branch November 1, 2023 09:06

tcojean mentioned this pull request Nov 6, 2023

Release 1.7.0 to master #1451

Merged



		/**
		* Logs the final residual and iteration count for a batch solver.

	* Logs the final residual and iteration count for a batch solver.
	* Logs the final actual residual norm and iteration count for a batch solver.

		EXPECT_LE(rel_res_norm, res_norm.get_const_data()[i]);
		ASSERT_LE(rel_res_norm, tol * 10);

	ASSERT_LE(res.res_norm->get_const_values()[i], tol * 50);
	ASSERT_LE(res.res_norm->get_const_values()[i], tol);

Add a batch::Bicgstab solver class, core, ref and omp kernels #1438

Add a batch::Bicgstab solver class, core, ref and omp kernels #1438

Conversation

pratikvn commented Oct 21, 2023

MarcelKoch left a comment

Choose a reason for hiding this comment

yhmtsai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcelKoch left a comment

Choose a reason for hiding this comment

MarcelKoch left a comment

Choose a reason for hiding this comment

yhmtsai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcelKoch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcelKoch commented Oct 25, 2023

pratikvn commented Oct 25, 2023 • edited Loading

pratikvn commented Nov 1, 2023

pratikvn commented Oct 25, 2023 •

edited

Loading