Adds distributed support for several solvers #976

MarcelKoch · 2022-02-24T08:38:41Z

This PR will enable using distributed matrices and vector (#971 and #961) in the following iterative solvers:

~~Bicg~~
Bicgstab
Cg
Cgs
Fcg
Ir

Currently not supported are:

Bicg
[cb_]Gmres
Idr
Multigrid
Lower/Upper_trs

With the changes in #861 it should be possible to also enable distributed systems for [cb_]gmres. I've not look into Idr too much, but I guess the issue there is that reductions are merged with other operations into one kernel launch and thus can't use global communication.

The handling of the distributed/non-distributed data is done via additional dispatch routines that expand on precision_dispatch_real_complex, and helper routines to extract the underlying dense matrix from either a distributed or dense vector. Also, the residual norm stopping criteria implementation has been changed to also use a similar dispatch approach.

This also contains some fixes regarding the doxygen documentation for the other distributed classes, which I will not add to the previous PRs.

Partially addresses #907.

Todos:

add solver tests
add [cb_]gmres (wait for Simplify GMRES kernels #861)
~~add Idr~~ postponed for now

Main contributions are from @upsj and @pratikvn.

tcojean

LGTM! I mostly have small comments.

examples/distributed-solver/doc/results.dox

examples/distributed-solver/doc/kind

include/ginkgo/core/base/precision_dispatch.hpp

core/stop/residual_norm.cpp

test/mpi/solver/solver.cpp

core/solver/bicg.cpp

test/mpi/solver/solver.cpp

pratikvn

LGTM!

core/stop/residual_norm.cpp

include/ginkgo/core/base/mpi.hpp

Co-authored-by: Terry Cojean <[email protected]>

- documentation - simplified any_is_complex check - move is_distributed Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Pratik Nayak <[email protected]>

MarcelKoch · 2022-08-26T15:45:25Z

format!

Co-authored-by: Marcel Koch <[email protected]>

previously this could lead to divergence between the processes and subsequent deadlocks

tcojean

LGTM

tcojean · 2022-09-23T14:34:21Z

core/solver/bicg.cpp

@@ -42,6 +42,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #include <ginkgo/core/base/utils.hpp>


+#include "core/distributed/helpers.hpp"


tcojean · 2022-09-23T14:35:38Z

include/ginkgo/core/solver/bicg.hpp

+ void apply_dense_impl(const gko::matrix::Dense<ValueType>* b,
+ gko::matrix::Dense<ValueType>* x) const;


Also unneeded?

tcojean · 2022-09-23T14:38:51Z

include/ginkgo/ginkgo.hpp

-#include <ginkgo/core/solver/lower_trs.hpp>
 #include <ginkgo/core/solver/multigrid.hpp>
 #include <ginkgo/core/solver/solver_base.hpp>
 #include <ginkgo/core/solver/solver_traits.hpp>
-#include <ginkgo/core/solver/upper_trs.hpp>


Shouldn't these be kept?

These are moved to triangular.hpp in develop, after rebasing that one should be here.

I don't think I will rebase this PR or #1054, so I will fix it

fritzgoebel

LGTM, nice work!

fritzgoebel · 2022-09-26T07:00:19Z

core/solver/idr.cpp

@@ -42,6 +42,8 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #include <ginkgo/core/solver/solver_base.hpp>


+#include "core/components/fill_array_kernels.hpp"


I think this is unused

fritzgoebel · 2022-09-26T07:13:23Z

include/ginkgo/ginkgo.hpp

-#include <ginkgo/core/solver/lower_trs.hpp>
 #include <ginkgo/core/solver/multigrid.hpp>
 #include <ginkgo/core/solver/solver_base.hpp>
 #include <ginkgo/core/solver/solver_traits.hpp>
-#include <ginkgo/core/solver/upper_trs.hpp>


These are moved to triangular.hpp in develop, after rebasing that one should be here.

examples/distributed-solver/distributed-solver.cpp

- remove unnecessary includes Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Fritz Goebel <[email protected]>

MarcelKoch · 2022-09-26T08:24:11Z

format!

Co-authored-by: Marcel Koch <[email protected]>

ginkgo-bot · 2022-09-27T07:36:58Z

Note: This PR changes the Ginkgo ABI:

Functions changes summary: 912 Removed, 0 Changed (1 filtered out), 1018 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

For details check the full ABI diff under Artifacts here

This PR will enable using distributed matrices and vector (#971 and #961) in the following iterative solvers: - Bicgstab - Cg - Cgs - Fcg - Ir Currently not supported are: - Bicg - [cb_]Gmres - Idr - Multigrid - Lower/Upper_trs The handling of the distributed/non-distributed data is done via additional dispatch routines that expand on precision_dispatch_real_complex, and helper routines to extract the underlying dense matrix from either a distributed or dense vector. Also, the residual norm stopping criteria implementation has been changed to also use a similar dispatch approach. This also contains some fixes regarding the doxygen documentation for the other distributed classes. Related PR: #976

This PR will add basic, distributed data structures (matrix and vector), and enable some solvers for these types. This PR contains the following PRs: - #961 - #971 - #976 - #985 - #1007 - #1030 - #1054 # Additional Changes - moves new types into experimental namespace - moves existing Partition class into experimental namespace - moves existing mpi namespace into experimental namespace - makes generic_scoped_device_id_guard destructor noexcept by terminating if restoring the original device id fails - switches to blocking communication in the SpMV if OpenMPI version 4.0.x is used - disables Horeka mpi tests and uses nla-gpu instead Related PR: #1133

Advertise release 1.5.0 and last changes + Add changelog, + Update third party libraries + A small fix to a CMake file See PR: #1195 The Ginkgo team is proud to announce the new Ginkgo minor release 1.5.0. This release brings many important new features such as: - MPI-based multi-node support for all matrix formats and most solvers; - full DPC++/SYCL support, - functionality and interface for GPU-resident sparse direct solvers, - an interface for wrapping solvers with scaling and reordering applied, - a new algebraic Multigrid solver/preconditioner, - improved mixed-precision support, - support for device matrix assembly, and much more. If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.13+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CUDA 9.2+ or NVHPC 22.7+ + HIP module: ROCm 4.0+ + DPC++ module: Intel OneAPI 2021.3 with oneMKL and oneDPL. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: GCC 5.5+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.2+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add MPI-based multi-node for all matrix formats and solvers (except GMRES and IDR). ([#676](#676), [#908](#908), [#909](#909), [#932](#932), [#951](#951), [#961](#961), [#971](#971), [#976](#976), [#985](#985), [#1007](#1007), [#1030](#1030), [#1054](#1054), [#1100](#1100), [#1148](#1148)) + Porting the remaining algorithms (preconditioners like ISAI, Jacobi, Multigrid, ParILU(T) and ParIC(T)) to DPC++/SYCL, update to SYCL 2020, and improve support and performance ([#896](#896), [#924](#924), [#928](#928), [#929](#929), [#933](#933), [#943](#943), [#960](#960), [#1057](#1057), [#1110](#1110), [#1142](#1142)) + Add a Sparse Direct interface supporting GPU-resident numerical LU factorization, symbolic Cholesky factorization, improved triangular solvers, and more ([#957](#957), [#1058](#1058), [#1072](#1072), [#1082](#1082)) + Add a ScaleReordered interface that can wrap solvers and automatically apply reorderings and scalings ([#1059](#1059)) + Add a Multigrid solver and improve the aggregation based PGM coarsening scheme ([#542](#542), [#913](#913), [#980](#980), [#982](#982), [#986](#986)) + Add infrastructure for unified, lambda-based, backend agnostic, kernels and utilize it for some simple kernels ([#833](#833), [#910](#910), [#926](#926)) + Merge different CUDA, HIP, DPC++ and OpenMP tests under a common interface ([#904](#904), [#973](#973), [#1044](#1044), [#1117](#1117)) + Add a device_matrix_data type for device-side matrix assembly ([#886](#886), [#963](#963), [#965](#965)) + Add support for mixed real/complex BLAS operations ([#864](#864)) + Add a FFT LinOp for all but DPC++/SYCL ([#701](#701)) + Add FBCSR support for NVIDIA and AMD GPUs and CPUs with OpenMP ([#775](#775)) + Add CSR scaling ([#848](#848)) + Add array::const_view and equivalent to create constant matrices from non-const data ([#890](#890)) + Add a RowGatherer LinOp supporting mixed precision to gather dense matrix rows ([#901](#901)) + Add mixed precision SparsityCsr SpMV support ([#970](#970)) + Allow creating CSR submatrix including from (possibly discontinuous) index sets ([#885](#885), [#964](#964)) + Add a scaled identity addition (M <- aI + bM) feature interface and impls for Csr and Dense ([#942](#942)) Deprecations and important changes: + Deprecate AmgxPgm in favor of the new Pgm name. ([#1149](#1149)). + Deprecate specialized residual norm classes in favor of a common `ResidualNorm` class ([#1101](#1101)) + Deprecate CamelCase non-polymorphic types in favor of snake_case versions (like array, machine_topology, uninitialized_array, index_set) ([#1031](#1031), [#1052](#1052)) + Bug fix: restrict gko::share to rvalue references (*possible interface break*) ([#1020](#1020)) + Bug fix: when using cuSPARSE's triangular solvers, specifying the factory parameter `num_rhs` is now required when solving for more than one right-hand side, otherwise an exception is thrown ([#1184](#1184)). + Drop official support for old CUDA < 9.2 ([#887](#887)) Improved performance additions: + Reuse tmp storage in reductions in solvers and add a mutable workspace to all solvers ([#1013](#1013), [#1028](#1028)) + Add HIP unsafe atomic option for AMD ([#1091](#1091)) + Prefer vendor implementations for Dense dot, conj_dot and norm2 when available ([#967](#967)). + Tuned OpenMP SellP, COO, and ELL SpMV kernels for a small number of RHS ([#809](#809)) Fixes: + Fix various compilation warnings ([#1076](#1076), [#1183](#1183), [#1189](#1189)) + Fix issues with hwloc-related tests ([#1074](#1074)) + Fix include headers for GCC 12 ([#1071](#1071)) + Fix for simple-solver-logging example ([#1066](#1066)) + Fix for potential memory leak in Logger ([#1056](#1056)) + Fix logging of mixin classes ([#1037](#1037)) + Improve value semantics for LinOp types, like moved-from state in cross-executor copy/clones ([#753](#753)) + Fix some matrix SpMV and conversion corner cases ([#905](#905), [#978](#978)) + Fix uninitialized data ([#958](#958)) + Fix CUDA version requirement for cusparseSpSM ([#953](#953)) + Fix several issues within bash-script ([#1016](#1016)) + Fixes for `NVHPC` compiler support ([#1194](#1194)) Other additions: + Simplify and properly name GMRES kernels ([#861](#861)) + Improve pkg-config support for non-CMake libraries ([#923](#923), [#1109](#1109)) + Improve gdb pretty printer ([#987](#987), [#1114](#1114)) + Add a logger highlighting inefficient allocation and copy patterns ([#1035](#1035)) + Improved and optimized test random matrix generation ([#954](#954), [#1032](#1032)) + Better CSR strategy defaults ([#969](#969)) + Add `move_from` to `PolymorphicObject` ([#997](#997)) + Remove unnecessary device_guard usage ([#956](#956)) + Improvements to the generic accessor for mixed-precision ([#727](#727)) + Add a naive lower triangular solver implementation for CUDA ([#764](#764)) + Add support for int64 indices from CUDA 11 onward with SpMV and SpGEMM ([#897](#897)) + Add a L1 norm implementation ([#900](#900)) + Add reduce_add for arrays ([#831](#831)) + Add utility to simplify Dense View creation from an existing Dense vector ([#1136](#1136)). + Add a custom transpose implementation for Fbcsr and Csr transpose for unsupported vendor types ([#1123](#1123)) + Make IDR random initilization deterministic ([#1116](#1116)) + Move the algorithm choice for triangular solvers from Csr::strategy_type to a factory parameter ([#1088](#1088)) + Update CUDA archCoresPerSM ([#1175](#1116)) + Add kernels for Csr sparsity pattern lookup ([#994](#994)) + Differentiate between structural and numerical zeros in Ell/Sellp ([#1027](#1027)) + Add a binary IO format for matrix data ([#984](#984)) + Add a tuple zip_iterator implementation ([#966](#966)) + Simplify kernel stubs and declarations ([#888](#888)) + Simplify GKO_REGISTER_OPERATION with lambdas ([#859](#859)) + Simplify copy to device in tests and examples ([#863](#863)) + More verbose output to array assertions ([#858](#858)) + Allow parallel compilation for Jacobi kernels ([#871](#871)) + Change clang-format pointer alignment to left ([#872](#872)) + Various improvements and fixes to the benchmarking framework ([#750](#750), [#759](#759), [#870](#870), [#911](#911), [#1033](#1033), [#1137](#1137)) + Various documentation improvements ([#892](#892), [#921](#921), [#950](#950), [#977](#977), [#1021](#1021), [#1068](#1068), [#1069](#1069), [#1080](#1080), [#1081](#1081), [#1108](#1108), [#1153](#1153), [#1154](#1154)) + Various CI improvements ([#868](#868), [#874](#874), [#884](#884), [#889](#889), [#899](#899), [#903](#903), [#922](#922), [#925](#925), [#930](#930), [#936](#936), [#937](#937), [#958](#958), [#882](#882), [#1011](#1011), [#1015](#1015), [#989](#989), [#1039](#1039), [#1042](#1042), [#1067](#1067), [#1073](#1073), [#1075](#1075), [#1083](#1083), [#1084](#1084), [#1085](#1085), [#1139](#1139), [#1178](#1178), [#1187](#1187))

MarcelKoch added this to the Ginkgo 1.5.0 milestone Feb 24, 2022

MarcelKoch added this to In progress in Distributed Ginkgo via automation Feb 24, 2022

MarcelKoch self-assigned this Feb 24, 2022

ginkgo-bot added reg:build This is related to the build system. reg:documentation This is related to documentation. reg:example This is related to the examples. labels Feb 24, 2022

MarcelKoch force-pushed the distributed-solvers branch from 9fb7ce5 to 1341500 Compare February 24, 2022 09:03

MarcelKoch added 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. 1:ST:WIP This PR is a work in progress. Not ready for review. and removed 1:ST:ready-for-review This PR is ready for review labels Feb 24, 2022

MarcelKoch force-pushed the new-distributed-matrix branch from a9b37cb to 4acb343 Compare February 28, 2022 14:31

MarcelKoch force-pushed the distributed-solvers branch from dbffde7 to 9122166 Compare February 28, 2022 14:32

MarcelKoch force-pushed the new-distributed-matrix branch from 4acb343 to cdd5bf9 Compare March 1, 2022 08:33

MarcelKoch force-pushed the distributed-solvers branch from 9122166 to 4f09d10 Compare March 1, 2022 08:34

upsj mentioned this pull request Mar 4, 2022

Create a clean distributed-ginkgo branch #907

Open

8 tasks

MarcelKoch force-pushed the new-distributed-matrix branch from cdd5bf9 to d98f4b1 Compare March 10, 2022 14:31

MarcelKoch force-pushed the new-distributed-matrix branch 2 times, most recently from dfbfb8a to d050278 Compare April 1, 2022 07:16

MarcelKoch force-pushed the new-distributed-matrix branch from cd61a1d to a293a5a Compare April 21, 2022 12:49

MarcelKoch force-pushed the distributed-solvers branch from 1dc6d94 to d923dbf Compare April 21, 2022 15:51

MarcelKoch force-pushed the new-distributed-matrix branch from a26e06d to 77a0638 Compare April 22, 2022 09:54

MarcelKoch force-pushed the distributed-solvers branch 3 times, most recently from b432578 to 3b13196 Compare April 22, 2022 13:50

tcojean approved these changes Aug 26, 2022

View reviewed changes

Distributed Ginkgo automation moved this from In progress to Reviewer approved Aug 26, 2022

pratikvn approved these changes Aug 26, 2022

View reviewed changes

core/stop/residual_norm.cpp Outdated Show resolved Hide resolved

include/ginkgo/core/base/mpi.hpp Show resolved Hide resolved

MarcelKoch and others added 3 commits August 26, 2022 10:55

adds distributed example kind

1d8d148

Co-authored-by: Terry Cojean <[email protected]>

removes template apply_impl of Bicg

af1e7ca

Co-authored-by: Terry Cojean <[email protected]>

review updates:

c05f6a1

- documentation - simplified any_is_complex check - move is_distributed Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Pratik Nayak <[email protected]>

ginkgo-bot and others added 5 commits August 26, 2022 15:46

Format files

0f58c62

Co-authored-by: Marcel Koch <[email protected]>

fixes residual_norm precision dispatch for non-mpi

f504af1

adds test with different partition types

6dc6667

removes special case if no non-local matrix

33fd976

previously this could lead to divergence between the processes and subsequent deadlocks

frees mpi request and makes it move-only

be0983f

tcojean approved these changes Sep 23, 2022

View reviewed changes

fritzgoebel approved these changes Sep 26, 2022

View reviewed changes

review updates:

06f9221

- remove unnecessary includes Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Fritz Goebel <[email protected]>

Format files

cae4b88

Co-authored-by: Marcel Koch <[email protected]>

MarcelKoch added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Sep 26, 2022

Merge branch 'distributed-develop' into distributed-solvers

8113be4

MarcelKoch merged commit 8b5785d into distributed-develop Sep 28, 2022

Distributed Ginkgo automation moved this from Reviewer approved to Done Sep 28, 2022

MarcelKoch deleted the distributed-solvers branch September 28, 2022 15:20

MarcelKoch mentioned this pull request Oct 5, 2022

Add distributed capabilities #1133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds distributed support for several solvers #976

Adds distributed support for several solvers #976

MarcelKoch commented Feb 24, 2022 •

edited

Loading

tcojean left a comment

pratikvn left a comment

MarcelKoch commented Aug 26, 2022

tcojean left a comment

tcojean Sep 23, 2022

tcojean Sep 23, 2022

tcojean Sep 23, 2022

fritzgoebel Sep 26, 2022

MarcelKoch Sep 26, 2022

fritzgoebel left a comment

fritzgoebel Sep 26, 2022

fritzgoebel Sep 26, 2022

MarcelKoch commented Sep 26, 2022

ginkgo-bot commented Sep 27, 2022

		@@ -42,6 +42,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
		#include <ginkgo/core/base/utils.hpp>


		#include "core/distributed/helpers.hpp"

		void apply_dense_impl(const gko::matrix::Dense<ValueType>* b,
		gko::matrix::Dense<ValueType>* x) const;

		@@ -42,6 +42,8 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
		#include <ginkgo/core/solver/solver_base.hpp>


		#include "core/components/fill_array_kernels.hpp"

Adds distributed support for several solvers #976

Adds distributed support for several solvers #976

Conversation

MarcelKoch commented Feb 24, 2022 • edited Loading

tcojean left a comment

Choose a reason for hiding this comment

pratikvn left a comment

Choose a reason for hiding this comment

MarcelKoch commented Aug 26, 2022

tcojean left a comment

Choose a reason for hiding this comment

tcojean Sep 23, 2022

Choose a reason for hiding this comment

tcojean Sep 23, 2022

Choose a reason for hiding this comment

tcojean Sep 23, 2022

Choose a reason for hiding this comment

fritzgoebel Sep 26, 2022

Choose a reason for hiding this comment

MarcelKoch Sep 26, 2022

Choose a reason for hiding this comment

fritzgoebel left a comment

Choose a reason for hiding this comment

fritzgoebel Sep 26, 2022

Choose a reason for hiding this comment

fritzgoebel Sep 26, 2022

Choose a reason for hiding this comment

MarcelKoch commented Sep 26, 2022

ginkgo-bot commented Sep 27, 2022

MarcelKoch commented Feb 24, 2022 •

edited

Loading