Using next_krylov_basis instead of krylov_bases #481

fritzgoebel · 2020-03-12T11:25:47Z

... to save unnecessary padding in the preconditioned vector. This made it necessary to choose a different spmv strategy when using plain GMRES or with ilu preconditioner, since the default choice doesn't support multiple vectors.

pratikvn

LGTM!

thoasm

The code looks good to me.

However, it looks like krylov_bases[:, i] is not necessarily next_krylov_bases (for i != 0) since step_1 seems to manipulate both differently. So, for now, it will not be an Approve until I figured it out (or you show me that they actually are equivalent).

fritzgoebel · 2020-03-12T17:22:47Z

The code looks good to me.

However, it looks like krylov_bases[:, i] is not necessarily next_krylov_bases (for i != 0) since step_1 seems to manipulate both differently. So, for now, it will not be an Approve until I figured it out (or you show me that they actually are equivalent).

Thomas, step 1 orthonormalizes next_krylov_basis against krylov_bases, iin which the previous krylov vectors are stored and afterwards writes next_krylov_basis into the first free column of krylov_bases. The rest of the algorithm doesn't change krylov_bases, it only stores the already obtained basis vectors.

Not sure if that is comprehensible, let me know if not :)

codecov · 2020-03-12T17:59:24Z

Codecov Report

Merging #481 into develop will decrease coverage by 0.12%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           develop     #481      +/-   ##
===========================================
- Coverage    88.82%   88.69%   -0.13%     
===========================================
  Files          256      256              
  Lines        16449    16450       +1     
===========================================
- Hits         14611    14591      -20     
- Misses        1838     1859      +21

Impacted Files	Coverage Δ
omp/test/solver/gmres_kernels.cpp	`100% <ø> (ø)`	⬆️
omp/solver/gmres_kernels.cpp	`0% <ø> (ø)`	⬆️
reference/solver/gmres_kernels.cpp	`95.68% <100%> (+0.07%)`	⬆️
core/solver/gmres.cpp	`93.02% <100%> (+0.31%)`	⬆️
omp/components/format_conversion.hpp	`40% <0%> (-60%)`	⬇️
include/ginkgo/core/matrix/coo.hpp	`70.27% <0%> (-29.73%)`	⬇️
include/ginkgo/core/matrix/hybrid.hpp	`76.92% <0%> (-12.8%)`	⬇️
include/ginkgo/core/matrix/csr.hpp	`70.7% <0%> (+2.07%)`	⬆️
include/ginkgo/core/base/polymorphic_object.hpp	`98.43% <0%> (+3.19%)`	⬆️
include/ginkgo/core/matrix/sellp.hpp	`90.24% <0%> (+8.42%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 793a8a8...34e4460. Read the comment docs.

upsj

LGTM! I guess we can ignore the Sonarqube warning

yhmtsai

LGTM.
Could you also do the test parabolic_fem and cz20468 like this PR?

I guess it requires higher copy time without preconditioner but lower apply time with preconditioner

thoasm

@fritzgoebel Thanks for the explanation, LGTM!
I just had limited time on Thursday and was not able to check it then (was just reading the comment and it looked like these two are computed differently).
Now, I looked at the implementation of step_1 and it does exactly what you said. Sorry for the hold up.

fritzgoebel · 2020-03-17T19:34:34Z

LGTM.
Could you also do the test parabolic_fem and cz20468 like this PR?

I guess it requires higher copy time without preconditioner but lower apply time with preconditioner

You're right, higher copy time without preconditioner and lower spmv time.
The results look as follows:

parabolic_fem gmres	develop	this PR
allocate	10489278	13615607
copy	733152	38226370
csr::advanced_spmv#5	1420452	1417904
csr::spmv#3	619128915	140911837
dense::add_scaled#3	305478	302069
dense::compute_norm2#2	730683	720691
free	3504299	3508288
gmres::initialize_1#7	151245	133480
gmres::initialize_2#7	7535975	7623398
gmres::step_1#11	9105763327	9174571261
gmres::step_2#6	60425878	48736218
residual_norm_reduction::residual_norm_reduction#9	38484834	37633714
set_all_statuses::set_all_statuses#3	16087	16142
total time	9807684867.0	9406512922.0

cz20468 gmres	develop	this PR
allocate	1097204	1109565
copy	471172	22697740
csr::advanced_spmv#5	232840	229640
csr::spmv#3	22618884	22106416
dense::add_scaled#3	163565	161941
dense::compute_norm2#2	624088	624450
free	788451	783874
gmres::initialize_1#7	26303	25861
gmres::initialize_2#7	507573	512541
gmres::step_1#11	836047682	831280415
gmres::step_2#6	4858147	4278588
residual_norm_reduction::residual_norm_reduction#9	36507831	36719686
set_all_statuses::set_all_statuses#3	14444	14529
total time	866834322.0	873171915.0

parabolic_fem jacobi-gmres	develop	this PR
allocate	10844859	13787887
copy	381019	382907
csr::advanced_spmv#5	1419300	1418164
csr::spmv#3	141036683	140871510
dense::add_scaled#3	283208	284785
dense::compute_norm2#2	735460	737568
free	3497251	3528137
gmres::initialize_1#7	37432	36918
gmres::initialize_2#7	7533181	7602786
gmres::step_1#11	9091034882	9031006389
gmres::step_2#6	60536508	48509333
residual_norm_reduction::residual_norm_reduction#9	37389200	37624787
set_all_statuses::set_all_statuses#3	16370	15593
total time	9807684867.0	9406512922.0

cz20468 jacobi-gmres	develop	this PR
allocate	1155547	1161187
copy	243903	244202
csr::advanced_spmv#5	235821	235956
csr::spmv#3	23194311	23254954
dense::add_scaled#3	161600	163309
dense::compute_norm2#2	635196	631974
free	792385	802053
gmres::initialize_1#7	28055	27730
gmres::initialize_2#7	507887	513957
gmres::step_1#11	834597414	830933371
gmres::step_2#6	4869268	4285080
residual_norm_reduction::residual_norm_reduction#9	36549182	36562928
set_all_statuses::set_all_statuses#3	15265	15998
total time	866834322.0	873171915.0

parabolic_fem adaptive-jacobi-gmres	develop	this PR
allocate	10857625	10922360
copy	376434	380551
csr::advanced_spmv#5	1420420	1424990
csr::spmv#3	140949268	141389777
dense::add_scaled#3	301866	304616
dense::compute_norm2#2	739890	739050
free	3509403	3521299
gmres::initialize_1#7	38011	37919
gmres::initialize_2#7	7544718	7606785
gmres::step_1#11	9106508986	9120126109
gmres::step_2#6	60212094	48773732
residual_norm_reduction::residual_norm_reduction#9	37586223	37614267
set_all_statuses::set_all_statuses#3	16429	15924
total time	9807684867.0	9406512922.0

cz20468 adaptive-jacobi-gmres	develop	this PR
allocate	1151909	1150076
copy	242711	244436
csr::advanced_spmv#5	234358	235956
csr::spmv#3	22861060	22806567
dense::add_scaled#3	170263	163035
dense::compute_norm2#2	637091	636317
free	802065	796795
gmres::initialize_1#7	28468	27857
gmres::initialize_2#7	507957	511603
gmres::step_1#11	836117938	831367861
gmres::step_2#6	4863637	4290163
residual_norm_reduction::residual_norm_reduction#9	36552544	36510519
set_all_statuses::set_all_statuses#3	15171	15760
total time	866834322.0	873171915.0

hartwiganzt · 2020-03-17T19:44:22Z

I appreciate the very detailed analysis! I assume the "sparser" a matrix is (large, few nnz), the higher is the "negative" impact. But we are talking about 1-3% here, so I am happy with it!

yhmtsai · 2020-03-18T04:29:09Z

LGTM. Also, total time seems not to change a lot.

…adding in the preconditioned vector

sonarcloud · 2020-03-18T14:24:18Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities (and 0 Security Hotspots to review)
2 Code Smells

40.0% Coverage
4.9% Duplication

The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.2.0. This release brings full HIP support to Ginkgo, new preconditioners (ParILUT, ISAI), conversion between double and float for all LinOps, and many more features and fixes. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 2.8+ + Windows + MinGW and CygWin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or CygWin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). # Additions Here are the main additions to the Ginkgo library. Other thematic additions are listed below. + Add full HIP support to Ginkgo [#344](#344), [#357](#357), [#384](#384), [#373](#373), [#391](#391), [#396](#396), [#395](#395), [#393](#393), [#404](#404), [#439](#439), [#443](#443), [#567](#567) + Add a new ISAI preconditioner [#489](#489), [#502](#502), [#512](#512), [#508](#508), [#520](#520) + Add support for ParILUT and ParICT factorization with ILU preconditioners [#400](#400) + Add a new BiCG solver [#438](#438) + Add a new permutation matrix format [#352](#352), [#469](#469) + Add CSR SpGEMM support [#386](#386), [#398](#398), [#418](#418), [#457](#457) + Add CSR SpGEAM support [#556](#556) + Make all solvers and preconditioners transposable [#535](#535) + Add CsrBuilder and CooBuilder for intrusive access to matrix arrays [#437](#437) + Add a standard-compliant allocator based on the Executors [#504](#504) + Support conversions for all LinOp between double and float [#521](#521) + Add a new boolean to the CUDA and HIP executors to control DeviceReset (default off) [#557](#557) + Add a relaxation factor to IR to represent Richardson Relaxation [#574](#574) + Add two new stopping criteria, for relative (to `norm(b)`) and absolute residual norm [#577](#577) ### Example additions + Templatize all examples to simplify changing the precision [#513](#513) + Add a new adaptive precision block-Jacobi example [#507](#507) + Add a new IR example [#522](#522) + Add a new Mixed Precision Iterative Refinement example [#525](#525) + Add a new example on iterative trisolves in ILU preconditioning [#526](#526), [#536](#536), [#550](#550) ### Compilation and library changes + Auto-detect compilation settings based on environment [#435](#435), [#537](#537) + Add SONAME to shared libraries [#524](#524) + Add clang-cuda support [#543](#543) ### Other additions + Add sorting, searching and merging kernels for GPUs [#403](#403), [#428](#428), [#417](#417), [#455](#455) + Add `gko::as` support for smart pointers [#493](#493) + Add setters and getters for criterion factories [#527](#527) + Add a new method to check whether a solver uses `x` as an initial guess [#531](#531) + Add contribution guidelines [#549](#549) # Fixes ### Algorithms + Improve the classical CSR strategy's performance [#401](#401) + Improve the CSR automatical strategy [#407](#407), [#559](#559) + Memory, speed improvements to the ELL kernel [#411](#411) + Multiple improvements and fixes to ParILU [#419](#419), [#427](#427), [#429](#429), [#456](#456), [#544](#544) + Fix multiple issues with GMRES [#481](#481), [#523](#523), [#575](#575) + Optimize OpenMP matrix conversions [#505](#505) + Ensure the linearity of the ILU preconditioner [#506](#506) + Fix IR's use of the advanced apply [#522](#522) + Fix empty matrices conversions and add tests [#560](#560) ### Other core functionalities + Fix complex number support in our math header [#410](#410) + Fix CUDA compatibility of the main ginkgo header [#450](#450) + Fix isfinite issues [#465](#465) + Fix the Array::view memory leak and the array/view copy/move [#485](#485) + Fix typos preventing use of some interface functions [#496](#496) + Fix the `gko::dim` to abide to the C++ standard [#498](#498) + Simplify the executor copy interface [#516](#516) + Optimize intermediate storage for Composition [#540](#540) + Provide an initial guess for relevant Compositions [#561](#561) + Better management of nullptr as criterion [#562](#562) + Fix the norm calculations for complex support [#564](#564) ### CUDA and HIP specific + Use the return value of the atomic operations in our wrappers [#405](#405) + Improve the portability of warp lane masks [#422](#422) + Extract thread ID computation into a separate function [#464](#464) + Reorder kernel parameters for consistency [#474](#474) + Fix the use of `pragma unroll` in HIP [#492](#492) ### Other + Fix the Ginkgo CMake installation files [#414](#414), [#553](#553) + Fix the Windows compilation [#415](#415) + Always use demangled types in error messages [#434](#434), [#486](#486) + Add CUDA header dependency to appropriate tests [#452](#452) + Fix several sonarqube or compilation warnings [#453](#453), [#463](#463), [#532](#532), [#569](#569) + Add shuffle tests [#460](#460) + Fix MSVC C2398 error [#490](#490) + Fix missing interface tests in test install [#558](#558) # Tools and ecosystem ### Benchmarks + Add better norm support in the benchmarks [#377](#377) + Add CUDA 10.1 generic SpMV support in benchmarks [#468](#468), [#473](#473) + Add sparse library ILU in benchmarks [#487](#487) + Add overhead benchmarking capacities [#501](#501) + Allow benchmarking from a matrix list file [#503](#503) + Fix benchmarking issue with JSON and non-finite numbers [#514](#514) + Fix benchmark logger crashers with OpenMP [#565](#565) ### CI related + Improvements to the CI setup with HIP compilation [#421](#421), [#466](#466) + Add MacOSX CI support [#470](#470), [#488](#488) + Add Windows CI support [#471](#471), [#488](#488), [#510](#510), [#566](#566) + Use sanitizers instead of valgrind [#476](#476) + Add automatic container generation and update facilities [#499](#499) + Fix the CI parallelism settings [#517](#517), [#538](#538), [#539](#539) + Make the codecov patch check informational [#519](#519) + Add support for LLVM sanitizers with improved thread sanitizer support [#578](#578) ### Test suite + Add an assertion for sparsity pattern equality [#416](#416) + Add core and reference multiprecision tests support [#448](#448) + Speed up GPU tests by avoiding device reset [#467](#467) + Change test matrix location string [#494](#494) ### Other + Add Ginkgo badges from our tools [#413](#413) + Update the `create_new_algorithm.sh` script [#420](#420) + Bump copyright and improve license management [#436](#436), [#433](#433) + Set clang-format minimum requirement [#441](#441), [#484](#484) + Update git-cmake-format [#446](#446), [#484](#484) + Disable the development tools by default [#442](#442) + Add a script for automatic header formatting [#447](#447) + Add GDB pretty printer for `gko::Array` [#509](#509) + Improve compilation speed [#533](#533) + Add editorconfig support [#546](#546) + Add a compile-time check for header self-sufficiency [#552](#552) # Related PR: #583

fritzgoebel changed the title ~~Gmres fix~~ Using next_krylov_basis instead of krylov_bases Mar 12, 2020

fritzgoebel requested review from thoasm, yhmtsai, upsj, pratikvn and tcojean March 12, 2020 11:28

tcojean added mod:core This is related to the core module. is:enhancement An improvement of an existing feature. 1:ST:ready-for-review This PR is ready for review type:solver This is related to the solvers labels Mar 12, 2020

tcojean assigned fritzgoebel Mar 12, 2020

pratikvn added the is:affects-performance This is related to something which affects performance. label Mar 12, 2020

fritzgoebel mentioned this pull request Mar 12, 2020

Using next_krylov_basis instead of krylov_bases to save unnecessary p… #480

Closed

pratikvn approved these changes Mar 12, 2020

View reviewed changes

thoasm reviewed Mar 12, 2020

View reviewed changes

upsj approved these changes Mar 13, 2020

View reviewed changes

yhmtsai approved these changes Mar 13, 2020

View reviewed changes

thoasm approved these changes Mar 16, 2020

View reviewed changes

thoasm added 1:ST:ready-to-merge This PR is ready to merge. 1:ST:ready-for-review This PR is ready for review and removed 1:ST:ready-for-review This PR is ready for review 1:ST:ready-to-merge This PR is ready to merge. labels Mar 16, 2020

pratikvn added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Mar 17, 2020

tcojean approved these changes Mar 18, 2020

View reviewed changes

fritzgoebel added 3 commits March 18, 2020 11:40

Using next_krylov_basis instead of krylov_bases to save unnecessary p…

398969a

…adding in the preconditioned vector

updated comments

50ccc2e

fix error in hip test

34e4460

fritzgoebel force-pushed the gmres_fix branch from 96e42db to 34e4460 Compare March 18, 2020 10:46

fritzgoebel merged commit 4b43ad2 into develop Mar 18, 2020

fritzgoebel deleted the gmres_fix branch March 18, 2020 15:37

yhmtsai mentioned this pull request Apr 30, 2020

Reorder krylov bases of gmres #523

Merged

1 task

tcojean mentioned this pull request Jun 23, 2020

Release/1.2.0 #576

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using next_krylov_basis instead of krylov_bases #481

Using next_krylov_basis instead of krylov_bases #481

fritzgoebel commented Mar 12, 2020 •

edited

Loading

pratikvn left a comment

thoasm left a comment

fritzgoebel commented Mar 12, 2020

codecov bot commented Mar 12, 2020 •

edited

Loading

upsj left a comment

yhmtsai left a comment

thoasm left a comment

fritzgoebel commented Mar 17, 2020

hartwiganzt commented Mar 17, 2020

yhmtsai commented Mar 18, 2020

sonarcloud bot commented Mar 18, 2020

Using next_krylov_basis instead of krylov_bases #481

Using next_krylov_basis instead of krylov_bases #481

Conversation

fritzgoebel commented Mar 12, 2020 • edited Loading

pratikvn left a comment

Choose a reason for hiding this comment

thoasm left a comment

Choose a reason for hiding this comment

fritzgoebel commented Mar 12, 2020

codecov bot commented Mar 12, 2020 • edited Loading

Codecov Report

upsj left a comment

Choose a reason for hiding this comment

yhmtsai left a comment

Choose a reason for hiding this comment

thoasm left a comment

Choose a reason for hiding this comment

fritzgoebel commented Mar 17, 2020

hartwiganzt commented Mar 17, 2020

yhmtsai commented Mar 18, 2020

sonarcloud bot commented Mar 18, 2020

fritzgoebel commented Mar 12, 2020 •

edited

Loading

codecov bot commented Mar 12, 2020 •

edited

Loading