Enable an ensemble build #1333

upsj · 2023-05-05T11:30:10Z

With these fixes, it's possible to build Ginkgo for all backends simultaneously on the ginkgodev/unified container image. Includes a few test fixes.

Corresponding CMake invocation:

cmake -GNinja -DCMAKE_CXX_COMPILER=dpcpp -DOpenMP_CXX_FLAGS=-qopenmp -DOpenMP_CXX_LIB_NAMES=libiomp5 -DOpenMP_libiomp5_LIBRARY=/opt/intel/oneapi/compiler/latest/linux/compiler/lib/intel64_lin/libiomp5.so ..

TODO:

~~maybe fix deprecation warnings~~
~~add a CI job for this~~ that will happen in Add development container #1267

upsj · 2023-07-04T11:38:46Z

rebase!

MarcelKoch

LGTM, except for the test folder structure.

test/mpi/distributed/solver/solver.cpp

test/mpi/distributed/CMakeLists.txt

thoasm

LGTM

thoasm · 2023-07-04T12:29:42Z

test/solver/idr_kernels.cpp

+#ifdef GKO_COMPILING_DPCPP
+#include <CL/sycl.hpp>
+#endif


What is in this header?
It seems you are moving this include from include/ginkgo/core/base/math.hpp to this file.

The queue is an incomplete class in executor.hpp

if (exec->get_queue()->get_device().is_gpu()) { GTEST_SKIP() << "skip the test because oneMKL GEMM on gpu may give NaN " "(under investigation)"; }

yhmtsai · 2023-07-04T21:14:01Z

test/matrix/fft_kernels.cpp

+#if defined(GKO_COMPILING_HIP) && GINKGO_HIP_PLATFORM_HCC
+ GTEST_SKIP() << "rocFFT has a bug related to strided 1D FFT";
+#endif


is something related to this ROCm/rocFFT#353 ?
but it should pass all in the current CI setup

No, this is a different bug (causing incorrect results, not an error) I didn't bother creating a reproducer for yet.

upsj · 2023-07-05T13:58:57Z

rebase!

- remove SYCL-specific headers from general builds - disable failing tests for rocFFT - disable DPC++ distributed tests

sonarcloud · 2023-07-07T00:17:24Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
1 Security Hotspot
45 Code Smells

99.1% Coverage
2.9% Duplication

Release 1.7.0 to master The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as: - Complete GPU-resident sparse direct solvers feature set and interfaces, - Improved Cholesky factorization performance, - A new MC64 reordering, - Batched iterative solver support with the BiCGSTAB solver with batched Dense and ELL matrix types, - MPI support for the SYCL backend, - Improved ParILU(T)/ParIC(T) preconditioner convergence, and more! If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.16+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2019+ + Apple Clang: 14.0 is tested. Earlier versions might also work. + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+ + HIP module: ROCm 4.5+ + DPC++ module: Intel oneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. + MPI: standard version 3.1+, ideally GPU Aware, for best performance + Windows + MinGW: GCC 5.5+ + Microsoft Visual Studio: VS 2019+ + CUDA module: CUDA 10.1+, Microsoft Visual Studio + OpenMP module: MinGW. ### Version support changes + CUDA 9.2 is no longer supported and 10.0 is untested [#1382](#1382) + Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) [#1368](#1368) ### Interface changes + `const` Factory parameters can no longer be modified through `with_*` functions, as this breaks const-correctness [#1336](#1336) [#1439](#1439) ### New Deprecations + The `device_reset` parameter of CUDA and HIP executors no longer has an effect, and its `allocation_mode` parameters have been deprecated in favor of the `Allocator` interface. [#1315](#1315) + The CMake parameter `GINKGO_BUILD_DPCPP` has been deprecated in favor of `GINKGO_BUILD_SYCL`. [#1350](#1350) + The `gko::reorder::Rcm` interface has been deprecated in favor of `gko::experimental::reorder::Rcm` based on `Permutation`. [#1418](#1418) + The Permutation class' `permute_mask` functionality. [#1415](#1415) + Multiple functions with typos (`set_complex_subpsace()`, range functions such as `conj_operaton` etc). [#1348](#1348) ### Summary of previous deprecations + `gko::lend()` is not necessary anymore. + The classes `RelativeResidualNorm` and `AbsoluteResidualNorm` are deprecated in favor of `ResidualNorm`. + The class `AmgxPgm` is deprecated in favor of `Pgm`. + Default constructors for the CSR `load_balance` and `automatical` strategies + The PolymorphicObject's move-semantic `copy_from` variant + The templated `SolverBase` class. + The class `MachineTopology` is deprecated in favor of `machine_topology`. + Logger constructors and create functions with the `executor` parameter. + The virtual, protected, Dense functions `compute_norm1_impl`, `add_scaled_impl`, etc. + Logger events for solvers and criterion without the additional `implicit_tau_sq` parameter. + The global `gko::solver::default_krylov_dim`, use instead `gko::solver::gmres_default_krylov_dim`. ### Added features + Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners [#1379](#1379) + Adds a batch::MultiVector class that enables operations such as dot, norm, scale on batched vectors [#1371](#1371) + Adds a batch::Dense matrix format that stores batched dense matrices and provides gemv operations for these dense matrices. [#1413](#1413) + Adds a batch::Ell matrix format that stores batched Ell matrices and provides spmv operations for these batched Ell matrices. [#1416](#1416) [#1437](#1437) + Add a batch::Bicgstab solver (class, core, and reference kernels) that enables iterative solution of batched linear systems [#1438](#1438). + Add device kernels (CUDA, HIP, and DPCPP) for batch::Bicgstab solver. [#1443](#1443). + New MC64 reordering algorithm which optimizes the diagonal product or sum of a matrix by permuting the rows, and computes additional scaling factors for equilibriation [#1120](#1120) + New interface for (non-symmetric) permutation and scaled permutation of Dense and Csr matrices [#1415](#1415) + LU and Cholesky Factorizations can now be separated into their factors [#1432](#1432) + New symbolic LU factorization algorithm that is optimized for matrices with an almost-symmetric sparsity pattern [#1445](#1445) + Sorting kernels for SparsityCsr on all backends [#1343](#1343) + Allow passing pre-generated local solver as factory parameter for the distributed Schwarz preconditioner [#1426](#1426) + Add DPCPP kernels for Partition [#1034](#1034), and CSR's `check_diagonal_entries` and `add_scaled_identity` functionality [#1436](#1436) + Adds a helper function to create a partition based on either local sizes, or local ranges [#1227](#1227) + Add function to compute arithmetic mean of dense and distributed vectors [#1275](#1275) + Adds `icpx` compiler supports [#1350](#1350) + All backends can be built simultaneously [#1333](#1333) + Emits a CMake warning in downstream projects that use different compilers than the installed Ginkgo [#1372](#1372) + Reordering algorithms in sparse_blas benchmark [#1354](#1354) + Benchmarks gained an `-allocator` parameter to specify device allocators [#1385](#1385) + Benchmarks gained an `-input_matrix` parameter that initializes the input JSON based on the filename [#1387](#1387) + Benchmark inputs can now be reordered as a preprocessing step [#1408](#1408) ### Improvements + Significantly improve Cholesky factorization performance [#1366](#1366) + Improve parallel build performance [#1378](#1378) + Allow constrained parallel test execution using CTest resources [#1373](#1373) + Use arithmetic type more inside mixed precision ELL [#1414](#1414) + Most factory parameters of factory type no longer need to be constructed explicitly via `.on(exec)` [#1336](#1336) [#1439](#1439) + Improve ParILU(T)/ParIC(T) convergence by using more appropriate atomic operations [#1434](#1434) ### Fixes + Fix an over-allocation for OpenMP reductions [#1369](#1369) + Fix DPCPP's common-kernel reduction for empty input sizes [#1362](#1362) + Fix several typos in the API and documentation [#1348](#1348) + Fix inconsistent `Threads` between generations [#1388](#1388) + Fix benchmark median condition [#1398](#1398) + Fix HIP 5.6.0 compilation [#1411](#1411) + Fix missing destruction of rand_generator from cuda/hip [#1417](#1417) + Fix PAPI logger destruction order [#1419](#1419) + Fix TAU logger compilation [#1422](#1422) + Fix relative criterion to not iterate if the residual is already zero [#1079](#1079) + Fix memory_order invocations with C++20 changes [#1402](#1402) + Fix `check_diagonal_entries_exist` report correctly when only missing diagonal value in the last rows. [#1440](#1440) + Fix checking OpenMPI version in cross-compilation settings [#1446](#1446) + Fix false-positive deprecation warnings in Ginkgo, especially for the old Rcm (it doesn't emit deprecation warnings anymore as a result but is still considered deprecated) [#1444](#1444) ### Related PR: #1451

Release 1.7.0 to develop The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as: - Complete GPU-resident sparse direct solvers feature set and interfaces, - Improved Cholesky factorization performance, - A new MC64 reordering, - Batched iterative solver support with the BiCGSTAB solver with batched Dense and ELL matrix types, - MPI support for the SYCL backend, - Improved ParILU(T)/ParIC(T) preconditioner convergence, and more! If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.16+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2019+ + Apple Clang: 14.0 is tested. Earlier versions might also work. + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+ + HIP module: ROCm 4.5+ + DPC++ module: Intel oneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. + MPI: standard version 3.1+, ideally GPU Aware, for best performance + Windows + MinGW: GCC 5.5+ + Microsoft Visual Studio: VS 2019+ + CUDA module: CUDA 10.1+, Microsoft Visual Studio + OpenMP module: MinGW. ### Version support changes + CUDA 9.2 is no longer supported and 10.0 is untested [#1382](#1382) + Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) [#1368](#1368) ### Interface changes + `const` Factory parameters can no longer be modified through `with_*` functions, as this breaks const-correctness [#1336](#1336) [#1439](#1439) ### New Deprecations + The `device_reset` parameter of CUDA and HIP executors no longer has an effect, and its `allocation_mode` parameters have been deprecated in favor of the `Allocator` interface. [#1315](#1315) + The CMake parameter `GINKGO_BUILD_DPCPP` has been deprecated in favor of `GINKGO_BUILD_SYCL`. [#1350](#1350) + The `gko::reorder::Rcm` interface has been deprecated in favor of `gko::experimental::reorder::Rcm` based on `Permutation`. [#1418](#1418) + The Permutation class' `permute_mask` functionality. [#1415](#1415) + Multiple functions with typos (`set_complex_subpsace()`, range functions such as `conj_operaton` etc). [#1348](#1348) ### Summary of previous deprecations + `gko::lend()` is not necessary anymore. + The classes `RelativeResidualNorm` and `AbsoluteResidualNorm` are deprecated in favor of `ResidualNorm`. + The class `AmgxPgm` is deprecated in favor of `Pgm`. + Default constructors for the CSR `load_balance` and `automatical` strategies + The PolymorphicObject's move-semantic `copy_from` variant + The templated `SolverBase` class. + The class `MachineTopology` is deprecated in favor of `machine_topology`. + Logger constructors and create functions with the `executor` parameter. + The virtual, protected, Dense functions `compute_norm1_impl`, `add_scaled_impl`, etc. + Logger events for solvers and criterion without the additional `implicit_tau_sq` parameter. + The global `gko::solver::default_krylov_dim`, use instead `gko::solver::gmres_default_krylov_dim`. ### Added features + Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners [#1379](#1379) + Adds a batch::MultiVector class that enables operations such as dot, norm, scale on batched vectors [#1371](#1371) + Adds a batch::Dense matrix format that stores batched dense matrices and provides gemv operations for these dense matrices. [#1413](#1413) + Adds a batch::Ell matrix format that stores batched Ell matrices and provides spmv operations for these batched Ell matrices. [#1416](#1416) [#1437](#1437) + Add a batch::Bicgstab solver (class, core, and reference kernels) that enables iterative solution of batched linear systems [#1438](#1438). + Add device kernels (CUDA, HIP, and DPCPP) for batch::Bicgstab solver. [#1443](#1443). + New MC64 reordering algorithm which optimizes the diagonal product or sum of a matrix by permuting the rows, and computes additional scaling factors for equilibriation [#1120](#1120) + New interface for (non-symmetric) permutation and scaled permutation of Dense and Csr matrices [#1415](#1415) + LU and Cholesky Factorizations can now be separated into their factors [#1432](#1432) + New symbolic LU factorization algorithm that is optimized for matrices with an almost-symmetric sparsity pattern [#1445](#1445) + Sorting kernels for SparsityCsr on all backends [#1343](#1343) + Allow passing pre-generated local solver as factory parameter for the distributed Schwarz preconditioner [#1426](#1426) + Add DPCPP kernels for Partition [#1034](#1034), and CSR's `check_diagonal_entries` and `add_scaled_identity` functionality [#1436](#1436) + Adds a helper function to create a partition based on either local sizes, or local ranges [#1227](#1227) + Add function to compute arithmetic mean of dense and distributed vectors [#1275](#1275) + Adds `icpx` compiler supports [#1350](#1350) + All backends can be built simultaneously [#1333](#1333) + Emits a CMake warning in downstream projects that use different compilers than the installed Ginkgo [#1372](#1372) + Reordering algorithms in sparse_blas benchmark [#1354](#1354) + Benchmarks gained an `-allocator` parameter to specify device allocators [#1385](#1385) + Benchmarks gained an `-input_matrix` parameter that initializes the input JSON based on the filename [#1387](#1387) + Benchmark inputs can now be reordered as a preprocessing step [#1408](#1408) ### Improvements + Significantly improve Cholesky factorization performance [#1366](#1366) + Improve parallel build performance [#1378](#1378) + Allow constrained parallel test execution using CTest resources [#1373](#1373) + Use arithmetic type more inside mixed precision ELL [#1414](#1414) + Most factory parameters of factory type no longer need to be constructed explicitly via `.on(exec)` [#1336](#1336) [#1439](#1439) + Improve ParILU(T)/ParIC(T) convergence by using more appropriate atomic operations [#1434](#1434) ### Fixes + Fix an over-allocation for OpenMP reductions [#1369](#1369) + Fix DPCPP's common-kernel reduction for empty input sizes [#1362](#1362) + Fix several typos in the API and documentation [#1348](#1348) + Fix inconsistent `Threads` between generations [#1388](#1388) + Fix benchmark median condition [#1398](#1398) + Fix HIP 5.6.0 compilation [#1411](#1411) + Fix missing destruction of rand_generator from cuda/hip [#1417](#1417) + Fix PAPI logger destruction order [#1419](#1419) + Fix TAU logger compilation [#1422](#1422) + Fix relative criterion to not iterate if the residual is already zero [#1079](#1079) + Fix memory_order invocations with C++20 changes [#1402](#1402) + Fix `check_diagonal_entries_exist` report correctly when only missing diagonal value in the last rows. [#1440](#1440) + Fix checking OpenMPI version in cross-compilation settings [#1446](#1446) + Fix false-positive deprecation warnings in Ginkgo, especially for the old Rcm (it doesn't emit deprecation warnings anymore as a result but is still considered deprecated) [#1444](#1444) ### Related PR: #1454

upsj added the 1:ST:WIP This PR is a work in progress. Not ready for review. label May 5, 2023

upsj self-assigned this May 5, 2023

upsj changed the title ~~Enable a unified build~~ Enable an ensemble build May 5, 2023

upsj added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Jul 2, 2023

upsj requested a review from a team July 2, 2023 18:15

MarcelKoch mentioned this pull request Jul 3, 2023

Add development container #1267

Draft

1 task

ginkgo-bot force-pushed the unified_build_fixes branch from a2e8fc6 to 8546b48 Compare July 4, 2023 11:39

MarcelKoch approved these changes Jul 4, 2023

View reviewed changes

test/mpi/distributed/solver/solver.cpp Outdated Show resolved Hide resolved

test/mpi/distributed/CMakeLists.txt Outdated Show resolved Hide resolved

thoasm approved these changes Jul 4, 2023

View reviewed changes

yhmtsai reviewed Jul 4, 2023

View reviewed changes

upsj added 1:ST:ready-to-merge This PR is ready to merge. 1:ST:no-changelog-entry Skip the wiki check for changelog update and removed 1:ST:ready-for-review This PR is ready for review 1:ST:no-changelog-entry Skip the wiki check for changelog update labels Jul 5, 2023

ginkgo-bot force-pushed the unified_build_fixes branch from 074e3e0 to f4f5661 Compare July 5, 2023 13:59

upsj added 4 commits July 6, 2023 15:08

enable ensemble builds

27d34b0

- remove SYCL-specific headers from general builds - disable failing tests for rocFFT - disable DPC++ distributed tests

uniform distributed test folder structure

60e7a38

reenable distributed vector test for dpcpp

7fcf2cb

move test/mpi/distributed to test/mpi

4755c23

upsj force-pushed the unified_build_fixes branch from f4f5661 to 4755c23 Compare July 6, 2023 13:13

upsj merged commit d60b02c into develop Jul 6, 2023
11 checks passed

upsj deleted the unified_build_fixes branch July 6, 2023 18:14

tcojean mentioned this pull request Nov 6, 2023

Release 1.7.0 to master #1451

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable an ensemble build #1333

Enable an ensemble build #1333

upsj commented May 5, 2023 •

edited

Loading

upsj commented Jul 4, 2023

MarcelKoch left a comment

thoasm left a comment

thoasm Jul 4, 2023

upsj Jul 5, 2023 •

edited

Loading

yhmtsai Jul 4, 2023

upsj Jul 5, 2023

upsj commented Jul 5, 2023

sonarcloud bot commented Jul 7, 2023

Enable an ensemble build #1333

Enable an ensemble build #1333

Conversation

upsj commented May 5, 2023 • edited Loading

upsj commented Jul 4, 2023

MarcelKoch left a comment

Choose a reason for hiding this comment

thoasm left a comment

Choose a reason for hiding this comment

thoasm Jul 4, 2023

Choose a reason for hiding this comment

upsj Jul 5, 2023 • edited Loading

Choose a reason for hiding this comment

yhmtsai Jul 4, 2023

Choose a reason for hiding this comment

upsj Jul 5, 2023

Choose a reason for hiding this comment

upsj commented Jul 5, 2023

sonarcloud bot commented Jul 7, 2023

upsj commented May 5, 2023 •

edited

Loading

upsj Jul 5, 2023 •

edited

Loading