Add Compressed Basis GMRES (CB-GMRES) #693

* Only core and reference executors. * test files don't compile, due to t problem related the macros of gtest (TEST_F -> TYPED_TEST). * MGS, MGS with reorthogonalization and CGS with reorthogonalization are considered. * Norms are still created in the internal routines.

* Now, the norms are properly created in the main class. * The test files are not repaired yet.

* For CGS, a loop of kernels is used instead of a kernel with a loop. * The test files are not repaired yet.

* The messages have to be removed. * For CGS, a loop of kernels is used instead of a kernel with a loop.

* For CGS, a loop of kernels is used instead of a kernel with a loop. * Consider another base_types for ValueTypeKrylovBases

…some errors which were detected during the testing process in the repository. The previous value was float whereas the original results were executed by default_precision, and these are the reason of the errors. Now, the default value is also default_precision.

…nd cuda executors, as a first step in the optimization process. Also the calls for the timing are included.

…cuda executors. For omp, the omp is trying to move to the outer loop For cuda, the loop of kernels is change to a kernels with a loop. * The main routines (loop of dots and loop of axpy) are still too expensive.

…done. Also timing instructions are included, whose management is made by some define's. The next step will be to improve the update kernels.

Added an accessor header file (name might have to change in future) and used it in all mixed precision kernels (but for now only for the reduced precision accesses). Also adds some minor fixes: - removed unused code in the example in hopes that it compiles on windows - added HIP stubs to allow HIP compilation

… close to 75s for 6221 iters. Next steps should be: * Add the computation of the inf-norm for the next_krylov_basis. * Merge updating and norms computations.

The specialization is currently set to only work with float storage type to test the pipeline, but it can easily be modified to work with all integer types. The Accessor was also moved from a shared header to a gmres_mixed exclusive header.

Also did some name changes.

…is used because the second one needs the correct management of norms as noncomplex values. The next step will be to use remove_complex for norms, and then, to use norminf.

…inf_kernel has been created. Also, the last 2-norm computation is uses as the hessenberg component, avoiding the use of update_hessenberg_2_kernel. Next steps: * To solve the problems which have been found when the set_scale method is used. * To know why the reference tests don't work properly.

Previously, the full krylov_bases were always used, regardless if a view was used or not. Now, the dimensions for krylov_bases are derived from hessenberg to take a potential view into account.

This allows it to compile with the writing of the scale.

I have also tested the <double, long int> case and it also works.

Previously, krylov bases b1, b2, b3 were stored [b1[0], b2[0], b3[0],b1[1], b2[1], b3[1], ...]. Now, they are stored in transposed fashion: [b1; b2; b3], which is equivalent to [b1[0], b1[1], ..., b2[0], b2[1],..., b3[0], b3[1], ...] Note that if multiple right hand sides are present, the vectors are still stored in row major. To make that happen, the accessor is now 3D. They store the follwing for (x, y, z): x: represent which krylov vector you want to address y: which row-element you want to use and z: which column-element, aka which right hand side you want to use. The accessor is stored row-major.

This is only done for finish_arnoldi_CGS2 by changing the kernels to adapt to the new storage format of krylov_bases.

Transposing is supposed to avoid write-conflicts during the atomic_add at the end. Obviously, the kernel was changed accordingly, so the behavior has not changed. Also contains: - Scaling is now only added for integer types (not for float anymore) - Add benchmarks for GmresMixed integer - Some cleanup for debug output in CUDA

The transposing needs further investigation, but a single test on a V100 lead to worse results than before, so I am undoing this change for now.

Additionally, removed the debug output and removed unused kernel parameter.

- The benchmark script now has the option to change the initial guess - Add option to generate the RHS with: b = A * (s / |s|) with s(i) = sin(i)

If the stopping criterion is met, perform a reset of GMRES and check the residual again. If it is still correct, exit, otherwise, keep calculating. Other changes: - Renamed krylov_dim_mixed to krylov_dim in GmresMixed (to be consistent with Gmres)

Adds the parameter in the benchmark script (and documentation).

Reference executor now compiles, with the donwside that an Accessor can no longer be created from const pointers.

So it compiles and works, it does not yet use `at` or similar.

Also add instantiation macro for ConstAccessors

Currently, only core is adapted with the reference test started (not all precision combinations are tested properly).

Also CUDA and OpenMP compiles now for the new accessor layout. Benchmarks is still TODO.

Also add instantiation for single precision floating point

Also adjust test precision to be more accurate.

Make GmresMixed reference test work on CI.

Of course also use scaled_reduced_row_major. Both are from the newly-added Accessors from the private headers.

Co-authored-by: Thomas Grützmacher <[email protected]>

TODO: Rename cb-gmres benchmark strings

Having forced iteration for small matrices can lead to NaN values during CB-GMRES. To prevent that, forced iterations now only happen after the 10th total iteration. Before that, a vector is immediately declared as converged as soon as the stopping criterion said so. Also done: - Removed useless debug-output from CB-GMRES - Changed the tolerances for the reference test to always pass

This is necessary for the Intel compiler to pass the test `SolvesStencilSystem2`.

- Remove unnecessary code - Add wrapper function to `atomic_helper` in order make it easier to implement other atomic operations (atomic_add and atomic_max use this wrapper)

in CB-GMRES Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Yuhsiang M. Tsai <[email protected]>

- Add arnoldi_norm documentation and add it in the tests (was not part of it previously, and needs a fix) - Add documentation to benchmarking - Rename namespace and helper for range helper Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Yuhsiang M. Tsai <[email protected]>

Co-authored-by: Terry Cojean <[email protected]>

Both examples are removed because the functionality is so similar to simple-solver, so it does not add a lot of value.

- Add more documentation to CB-GMRES - Add CB-GMRES to test-install Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Yuhsiang M. Tsai <[email protected]>

`--expt-relaxed-constexpr` is now used for every CUDA version

Co-authored-by: Pratik Nayak <[email protected]>

The following modifications were done for CB-GMRES: - Remove unused kernel parameters `num_reorth_steps` and `num_reorth_vectors` - Remove unused `b_norm` - Make unused kernel parameters unnamed - Add some explicit casts to prevent warning

- Update documentation of GMRES to mention the usage of MGS - Use reduced precision in CB-GMRES by default

Co-authored-by: Pratik Nayak <[email protected]>

- Extract GPU kernels for CB-GMRES and GMRES into a new file to avoid duplication. - Adopt the updated GMRES functionality for these kernels for CPU and GPU Co-authored-by: Pratik Nayak <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Compressed Basis GMRES (CB-GMRES) #693

Add Compressed Basis GMRES (CB-GMRES) #693

Commits on Feb 19, 2021