Add compact factor storage support to triangular solvers #1072

upsj · 2022-07-10T21:02:23Z

Some factorization algorithms like cuSPARSE's ILU(0) and also our exact LU will store the factors not as a composition of two separate matrices L * U, but as a combined factor L + U - I with L having a unit diagonal. To support this behavior without performance overhead, we need to ignore any entries in the system matrix of a triangular solver that doesn't belong to the triangle, and provide a switch to ignore diagonal values, assuming a unit diagonal instead.

TODO:

Test custom CUDA triangular solvers

cc @lksriemer

lksriemer · 2022-07-11T09:32:48Z

LGTM, I'll run some tests later today. I will also modify the other unpublished solves to include this capability. Maybe it makes sense to template on this parameter for some of them, I'll check that out.

Edit: Ran those tests, no irregularities showed up for any matrices for the modified naive caching and naive legacy algorithms, so this should be good from my side. Only thing I am not sure about is the case where unit diagonal is assumed, but structurally a zero is on the diagonal. In that case, of course, we can't rely on row == col as diag/exit condition,

yhmtsai

LGTM for the current state. When you finish the todo, I will review the addition

omp/solver/lower_trs_kernels.cpp

omp/solver/upper_trs_kernels.cpp

This may happen if the matrix is unsorted or doesn't have a diagonal entry.

ginkgo-bot · 2022-07-19T08:24:18Z

Note: This PR changes the Ginkgo ABI:

Functions changes summary: 0 Removed, 64 Changed (1424 filtered out), 0 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

For details check the full ABI diff under Artifacts here

fritzgoebel

LGTM!

sonarcloud · 2022-07-19T12:14:02Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities
2 Security Hotspots
72 Code Smells

100.0% Coverage
34.3% Duplication

Advertise release 1.5.0 and last changes + Add changelog, + Update third party libraries + A small fix to a CMake file See PR: #1195 The Ginkgo team is proud to announce the new Ginkgo minor release 1.5.0. This release brings many important new features such as: - MPI-based multi-node support for all matrix formats and most solvers; - full DPC++/SYCL support, - functionality and interface for GPU-resident sparse direct solvers, - an interface for wrapping solvers with scaling and reordering applied, - a new algebraic Multigrid solver/preconditioner, - improved mixed-precision support, - support for device matrix assembly, and much more. If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.13+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CUDA 9.2+ or NVHPC 22.7+ + HIP module: ROCm 4.0+ + DPC++ module: Intel OneAPI 2021.3 with oneMKL and oneDPL. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: GCC 5.5+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.2+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add MPI-based multi-node for all matrix formats and solvers (except GMRES and IDR). ([#676](#676), [#908](#908), [#909](#909), [#932](#932), [#951](#951), [#961](#961), [#971](#971), [#976](#976), [#985](#985), [#1007](#1007), [#1030](#1030), [#1054](#1054), [#1100](#1100), [#1148](#1148)) + Porting the remaining algorithms (preconditioners like ISAI, Jacobi, Multigrid, ParILU(T) and ParIC(T)) to DPC++/SYCL, update to SYCL 2020, and improve support and performance ([#896](#896), [#924](#924), [#928](#928), [#929](#929), [#933](#933), [#943](#943), [#960](#960), [#1057](#1057), [#1110](#1110), [#1142](#1142)) + Add a Sparse Direct interface supporting GPU-resident numerical LU factorization, symbolic Cholesky factorization, improved triangular solvers, and more ([#957](#957), [#1058](#1058), [#1072](#1072), [#1082](#1082)) + Add a ScaleReordered interface that can wrap solvers and automatically apply reorderings and scalings ([#1059](#1059)) + Add a Multigrid solver and improve the aggregation based PGM coarsening scheme ([#542](#542), [#913](#913), [#980](#980), [#982](#982), [#986](#986)) + Add infrastructure for unified, lambda-based, backend agnostic, kernels and utilize it for some simple kernels ([#833](#833), [#910](#910), [#926](#926)) + Merge different CUDA, HIP, DPC++ and OpenMP tests under a common interface ([#904](#904), [#973](#973), [#1044](#1044), [#1117](#1117)) + Add a device_matrix_data type for device-side matrix assembly ([#886](#886), [#963](#963), [#965](#965)) + Add support for mixed real/complex BLAS operations ([#864](#864)) + Add a FFT LinOp for all but DPC++/SYCL ([#701](#701)) + Add FBCSR support for NVIDIA and AMD GPUs and CPUs with OpenMP ([#775](#775)) + Add CSR scaling ([#848](#848)) + Add array::const_view and equivalent to create constant matrices from non-const data ([#890](#890)) + Add a RowGatherer LinOp supporting mixed precision to gather dense matrix rows ([#901](#901)) + Add mixed precision SparsityCsr SpMV support ([#970](#970)) + Allow creating CSR submatrix including from (possibly discontinuous) index sets ([#885](#885), [#964](#964)) + Add a scaled identity addition (M <- aI + bM) feature interface and impls for Csr and Dense ([#942](#942)) Deprecations and important changes: + Deprecate AmgxPgm in favor of the new Pgm name. ([#1149](#1149)). + Deprecate specialized residual norm classes in favor of a common `ResidualNorm` class ([#1101](#1101)) + Deprecate CamelCase non-polymorphic types in favor of snake_case versions (like array, machine_topology, uninitialized_array, index_set) ([#1031](#1031), [#1052](#1052)) + Bug fix: restrict gko::share to rvalue references (*possible interface break*) ([#1020](#1020)) + Bug fix: when using cuSPARSE's triangular solvers, specifying the factory parameter `num_rhs` is now required when solving for more than one right-hand side, otherwise an exception is thrown ([#1184](#1184)). + Drop official support for old CUDA < 9.2 ([#887](#887)) Improved performance additions: + Reuse tmp storage in reductions in solvers and add a mutable workspace to all solvers ([#1013](#1013), [#1028](#1028)) + Add HIP unsafe atomic option for AMD ([#1091](#1091)) + Prefer vendor implementations for Dense dot, conj_dot and norm2 when available ([#967](#967)). + Tuned OpenMP SellP, COO, and ELL SpMV kernels for a small number of RHS ([#809](#809)) Fixes: + Fix various compilation warnings ([#1076](#1076), [#1183](#1183), [#1189](#1189)) + Fix issues with hwloc-related tests ([#1074](#1074)) + Fix include headers for GCC 12 ([#1071](#1071)) + Fix for simple-solver-logging example ([#1066](#1066)) + Fix for potential memory leak in Logger ([#1056](#1056)) + Fix logging of mixin classes ([#1037](#1037)) + Improve value semantics for LinOp types, like moved-from state in cross-executor copy/clones ([#753](#753)) + Fix some matrix SpMV and conversion corner cases ([#905](#905), [#978](#978)) + Fix uninitialized data ([#958](#958)) + Fix CUDA version requirement for cusparseSpSM ([#953](#953)) + Fix several issues within bash-script ([#1016](#1016)) + Fixes for `NVHPC` compiler support ([#1194](#1194)) Other additions: + Simplify and properly name GMRES kernels ([#861](#861)) + Improve pkg-config support for non-CMake libraries ([#923](#923), [#1109](#1109)) + Improve gdb pretty printer ([#987](#987), [#1114](#1114)) + Add a logger highlighting inefficient allocation and copy patterns ([#1035](#1035)) + Improved and optimized test random matrix generation ([#954](#954), [#1032](#1032)) + Better CSR strategy defaults ([#969](#969)) + Add `move_from` to `PolymorphicObject` ([#997](#997)) + Remove unnecessary device_guard usage ([#956](#956)) + Improvements to the generic accessor for mixed-precision ([#727](#727)) + Add a naive lower triangular solver implementation for CUDA ([#764](#764)) + Add support for int64 indices from CUDA 11 onward with SpMV and SpGEMM ([#897](#897)) + Add a L1 norm implementation ([#900](#900)) + Add reduce_add for arrays ([#831](#831)) + Add utility to simplify Dense View creation from an existing Dense vector ([#1136](#1136)). + Add a custom transpose implementation for Fbcsr and Csr transpose for unsupported vendor types ([#1123](#1123)) + Make IDR random initilization deterministic ([#1116](#1116)) + Move the algorithm choice for triangular solvers from Csr::strategy_type to a factory parameter ([#1088](#1088)) + Update CUDA archCoresPerSM ([#1175](#1116)) + Add kernels for Csr sparsity pattern lookup ([#994](#994)) + Differentiate between structural and numerical zeros in Ell/Sellp ([#1027](#1027)) + Add a binary IO format for matrix data ([#984](#984)) + Add a tuple zip_iterator implementation ([#966](#966)) + Simplify kernel stubs and declarations ([#888](#888)) + Simplify GKO_REGISTER_OPERATION with lambdas ([#859](#859)) + Simplify copy to device in tests and examples ([#863](#863)) + More verbose output to array assertions ([#858](#858)) + Allow parallel compilation for Jacobi kernels ([#871](#871)) + Change clang-format pointer alignment to left ([#872](#872)) + Various improvements and fixes to the benchmarking framework ([#750](#750), [#759](#759), [#870](#870), [#911](#911), [#1033](#1033), [#1137](#1137)) + Various documentation improvements ([#892](#892), [#921](#921), [#950](#950), [#977](#977), [#1021](#1021), [#1068](#1068), [#1069](#1069), [#1080](#1080), [#1081](#1081), [#1108](#1108), [#1153](#1153), [#1154](#1154)) + Various CI improvements ([#868](#868), [#874](#874), [#884](#884), [#889](#889), [#899](#899), [#903](#903), [#922](#922), [#925](#925), [#930](#930), [#936](#936), [#937](#937), [#958](#958), [#882](#882), [#1011](#1011), [#1015](#1015), [#989](#989), [#1039](#1039), [#1042](#1042), [#1067](#1067), [#1073](#1073), [#1075](#1075), [#1083](#1083), [#1084](#1084), [#1085](#1085), [#1139](#1139), [#1178](#1178), [#1187](#1187))

upsj added the 1:ST:ready-for-review This PR is ready for review label Jul 10, 2022

upsj added this to the Ginkgo 1.5.0 milestone Jul 10, 2022

upsj self-assigned this Jul 10, 2022

ginkgo-bot added mod:all This touches all Ginkgo modules. reg:build This is related to the build system. reg:testing This is related to testing. type:solver This is related to the solvers labels Jul 10, 2022

upsj requested review from a team July 10, 2022 21:07

upsj force-pushed the compact_triangular_solve branch from 8212b9b to 0b7e097 Compare July 11, 2022 16:45

yhmtsai approved these changes Jul 14, 2022

View reviewed changes

omp/solver/lower_trs_kernels.cpp Show resolved Hide resolved

omp/solver/upper_trs_kernels.cpp Show resolved Hide resolved

upsj added 3 commits July 19, 2022 10:01

add compact factor storage support to trisolvers

9c1f95b

guard triangular solver against infinite loops

71ca18d

This may happen if the matrix is unsorted or doesn't have a diagonal entry.

add tests for custom CUDA triangular solvers

758f9fa

upsj force-pushed the compact_triangular_solve branch from 0b7e097 to 758f9fa Compare July 19, 2022 08:01

fix formatting config for triangular solver tests

5229ac0

upsj added the 1:ST:run-full-test label Jul 19, 2022

fritzgoebel approved these changes Jul 19, 2022

View reviewed changes

upsj added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Jul 19, 2022

vasilisge0 approved these changes Jul 19, 2022

View reviewed changes

upsj merged commit 3aba0f7 into develop Jul 19, 2022

upsj deleted the compact_triangular_solve branch July 19, 2022 12:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compact factor storage support to triangular solvers #1072

Add compact factor storage support to triangular solvers #1072

upsj commented Jul 10, 2022 •

edited

Loading

lksriemer commented Jul 11, 2022 •

edited

Loading

yhmtsai left a comment

ginkgo-bot commented Jul 19, 2022

fritzgoebel left a comment

sonarcloud bot commented Jul 19, 2022

Add compact factor storage support to triangular solvers #1072

Add compact factor storage support to triangular solvers #1072

Conversation

upsj commented Jul 10, 2022 • edited Loading

lksriemer commented Jul 11, 2022 • edited Loading

yhmtsai left a comment

Choose a reason for hiding this comment

ginkgo-bot commented Jul 19, 2022

fritzgoebel left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Jul 19, 2022

upsj commented Jul 10, 2022 •

edited

Loading

lksriemer commented Jul 11, 2022 •

edited

Loading