Skip to content

Release 1.7.0

Compare
Choose a tag to compare
@tcojean tcojean released this 10 Nov 18:53
· 516 commits to master since this release
49242ff

The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as:

  • Complete GPU-resident sparse direct solvers feature set and interfaces,
  • Improved Cholesky factorization performance,
  • A new MC64 reordering,
  • Batched iterative solver support with the BiCGSTAB solver with batched Dense and ELL matrix types,
  • MPI support for the SYCL backend,
  • Improved ParILU(T)/ParIC(T) preconditioner convergence,
    and more!

If you face an issue, please first check our known issues page and the open issues list and if you do not find a solution, feel free to open a new issue or ask a question using the github discussions.

Supported systems and requirements:

  • For all platforms, CMake 3.16+
  • C++14 compliant compiler
  • Linux and macOS
    • GCC: 5.5+
    • clang: 3.9+
    • Intel compiler: 2019+
    • Apple Clang: 14.0 is tested. Earlier versions might also work.
    • NVHPC: 22.7+
    • Cray Compiler: 14.0.1+
    • CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+
    • HIP module: ROCm 4.5+
    • DPC++ module: Intel oneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to dpcpp or icpx.
    • MPI: standard version 3.1+, ideally GPU Aware, for best performance
  • Windows
    • MinGW: GCC 5.5+
    • Microsoft Visual Studio: VS 2019+
    • CUDA module: CUDA 10.1+, Microsoft Visual Studio
    • OpenMP module: MinGW.

Version support changes

  • CUDA 9.2 is no longer supported and 10.0 is untested #1382
  • Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) #1368

Interface changes

  • const Factory parameters can no longer be modified through with_* functions, as this breaks const-correctness #1336 #1439

New Deprecations

  • The device_reset parameter of CUDA and HIP executors no longer has an effect, and its allocation_mode parameters have been deprecated in favor of the Allocator interface. #1315
  • The CMake parameter GINKGO_BUILD_DPCPP has been deprecated in favor of GINKGO_BUILD_SYCL. #1350
  • The gko::reorder::Rcm interface has been deprecated in favor of gko::experimental::reorder::Rcm based on Permutation. #1418
  • The Permutation class' permute_mask functionality. #1415
  • Multiple functions with typos (set_complex_subpsace(), range functions such as conj_operaton etc). #1348

Summary of previous deprecations

  • gko::lend() is not necessary anymore.
  • The classes RelativeResidualNorm and AbsoluteResidualNorm are deprecated in favor of ResidualNorm.
  • The class AmgxPgm is deprecated in favor of Pgm.
  • Default constructors for the CSR load_balance and automatical strategies
  • The PolymorphicObject's move-semantic copy_from variant
  • The templated SolverBase class.
  • The class MachineTopology is deprecated in favor of machine_topology.
  • Logger constructors and create functions with the executor parameter.
  • The virtual, protected, Dense functions compute_norm1_impl, add_scaled_impl, etc.
  • Logger events for solvers and criterion without the additional implicit_tau_sq parameter.
  • The global gko::solver::default_krylov_dim, use instead gko::solver::gmres_default_krylov_dim.

Added features

  • Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners #1379
  • Adds a batch::MultiVector class that enables operations such as dot, norm, scale on batched vectors #1371
  • Adds a batch::Dense matrix format that stores batched dense matrices and provides gemv operations for these dense matrices. #1413
  • Adds a batch::Ell matrix format that stores batched Ell matrices and provides spmv operations for these batched Ell matrices. #1416 #1437
  • Add a batch::Bicgstab solver (class, core, and reference kernels) that enables iterative solution of batched linear systems #1438.
  • Add device kernels (CUDA, HIP, and DPCPP) for batch::Bicgstab solver. #1443.
  • New MC64 reordering algorithm which optimizes the diagonal product or sum of a matrix by permuting the rows, and computes additional scaling factors for equilibriation #1120
  • New interface for (non-symmetric) permutation and scaled permutation of Dense and Csr matrices #1415
  • LU and Cholesky Factorizations can now be separated into their factors #1432
  • New symbolic LU factorization algorithm that is optimized for matrices with an almost-symmetric sparsity pattern #1445
  • Sorting kernels for SparsityCsr on all backends #1343
  • Allow passing pre-generated local solver as factory parameter for the distributed Schwarz preconditioner #1426
  • Add DPCPP kernels for Partition #1034, and CSR's check_diagonal_entries and add_scaled_identity functionality #1436
  • Adds a helper function to create a partition based on either local sizes, or local ranges #1227
  • Add function to compute arithmetic mean of dense and distributed vectors #1275
  • Adds icpx compiler supports #1350
  • All backends can be built simultaneously #1333
  • Emits a CMake warning in downstream projects that use different compilers than the installed Ginkgo #1372
  • Reordering algorithms in sparse_blas benchmark #1354
  • Benchmarks gained an -allocator parameter to specify device allocators #1385
  • Benchmarks gained an -input_matrix parameter that initializes the input JSON based on the filename #1387
  • Benchmark inputs can now be reordered as a preprocessing step #1408

Improvements

  • Significantly improve Cholesky factorization performance #1366
  • Improve parallel build performance #1378
  • Allow constrained parallel test execution using CTest resources #1373
  • Use arithmetic type more inside mixed precision ELL #1414
  • Most factory parameters of factory type no longer need to be constructed explicitly via .on(exec) #1336 #1439
  • Improve ParILU(T)/ParIC(T) convergence by using more appropriate atomic operations #1434

Fixes

  • Fix an over-allocation for OpenMP reductions #1369
  • Fix DPCPP's common-kernel reduction for empty input sizes #1362
  • Fix several typos in the API and documentation #1348
  • Fix inconsistent Threads between generations #1388
  • Fix benchmark median condition #1398
  • Fix HIP 5.6.0 compilation #1411
  • Fix missing destruction of rand_generator from cuda/hip #1417
  • Fix PAPI logger destruction order #1419
  • Fix TAU logger compilation #1422
  • Fix relative criterion to not iterate if the residual is already zero #1079
  • Fix memory_order invocations with C++20 changes #1402
  • Fix check_diagonal_entries_exist report correctly when only missing diagonal value in the last rows. #1440
  • Fix checking OpenMPI version in cross-compilation settings #1446
  • Fix false-positive deprecation warnings in Ginkgo, especially for the old Rcm (it doesn't emit deprecation warnings anymore as a result but is still considered deprecated) #1444