Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using next_krylov_basis instead of krylov_bases #481

Merged
merged 3 commits into from
Mar 18, 2020
Merged

Conversation

fritzgoebel
Copy link
Collaborator

@fritzgoebel fritzgoebel commented Mar 12, 2020

... to save unnecessary padding in the preconditioned vector. This made it necessary to choose a different spmv strategy when using plain GMRES or with ilu preconditioner, since the default choice doesn't support multiple vectors.

@fritzgoebel fritzgoebel changed the title Gmres fix Using next_krylov_basis instead of krylov_bases Mar 12, 2020
@tcojean tcojean added mod:core This is related to the core module. is:enhancement An improvement of an existing feature. 1:ST:ready-for-review This PR is ready for review type:solver This is related to the solvers labels Mar 12, 2020
@pratikvn pratikvn added the is:affects-performance This is related to something which affects performance. label Mar 12, 2020
Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me.

However, it looks like krylov_bases[:, i] is not necessarily next_krylov_bases (for i != 0) since step_1 seems to manipulate both differently. So, for now, it will not be an Approve until I figured it out (or you show me that they actually are equivalent).

@fritzgoebel
Copy link
Collaborator Author

The code looks good to me.

However, it looks like krylov_bases[:, i] is not necessarily next_krylov_bases (for i != 0) since step_1 seems to manipulate both differently. So, for now, it will not be an Approve until I figured it out (or you show me that they actually are equivalent).

Thomas, step 1 orthonormalizes next_krylov_basis against krylov_bases, iin which the previous krylov vectors are stored and afterwards writes next_krylov_basis into the first free column of krylov_bases. The rest of the algorithm doesn't change krylov_bases, it only stores the already obtained basis vectors.

Not sure if that is comprehensible, let me know if not :)

@codecov
Copy link

codecov bot commented Mar 12, 2020

Codecov Report

Merging #481 into develop will decrease coverage by 0.12%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #481      +/-   ##
===========================================
- Coverage    88.82%   88.69%   -0.13%     
===========================================
  Files          256      256              
  Lines        16449    16450       +1     
===========================================
- Hits         14611    14591      -20     
- Misses        1838     1859      +21
Impacted Files Coverage Δ
omp/test/solver/gmres_kernels.cpp 100% <ø> (ø) ⬆️
omp/solver/gmres_kernels.cpp 0% <ø> (ø) ⬆️
reference/solver/gmres_kernels.cpp 95.68% <100%> (+0.07%) ⬆️
core/solver/gmres.cpp 93.02% <100%> (+0.31%) ⬆️
omp/components/format_conversion.hpp 40% <0%> (-60%) ⬇️
include/ginkgo/core/matrix/coo.hpp 70.27% <0%> (-29.73%) ⬇️
include/ginkgo/core/matrix/hybrid.hpp 76.92% <0%> (-12.8%) ⬇️
include/ginkgo/core/matrix/csr.hpp 70.7% <0%> (+2.07%) ⬆️
include/ginkgo/core/base/polymorphic_object.hpp 98.43% <0%> (+3.19%) ⬆️
include/ginkgo/core/matrix/sellp.hpp 90.24% <0%> (+8.42%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 793a8a8...34e4460. Read the comment docs.

Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I guess we can ignore the Sonarqube warning

Copy link
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Could you also do the test parabolic_fem and cz20468 like this PR?

I guess it requires higher copy time without preconditioner but lower apply time with preconditioner

Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fritzgoebel Thanks for the explanation, LGTM!
I just had limited time on Thursday and was not able to check it then (was just reading the comment and it looked like these two are computed differently).
Now, I looked at the implementation of step_1 and it does exactly what you said. Sorry for the hold up.

@thoasm thoasm added 1:ST:ready-to-merge This PR is ready to merge. 1:ST:ready-for-review This PR is ready for review and removed 1:ST:ready-for-review This PR is ready for review 1:ST:ready-to-merge This PR is ready to merge. labels Mar 16, 2020
@fritzgoebel
Copy link
Collaborator Author

LGTM.
Could you also do the test parabolic_fem and cz20468 like this PR?

I guess it requires higher copy time without preconditioner but lower apply time with preconditioner

You're right, higher copy time without preconditioner and lower spmv time.
The results look as follows:

parabolic_fem gmres develop this PR
allocate 10489278 13615607
copy 733152 38226370
csr::advanced_spmv#5 1420452 1417904
csr::spmv#3 619128915 140911837
dense::add_scaled#3 305478 302069
dense::compute_norm2#2 730683 720691
free 3504299 3508288
gmres::initialize_1#7 151245 133480
gmres::initialize_2#7 7535975 7623398
gmres::step_1#11 9105763327 9174571261
gmres::step_2#6 60425878 48736218
residual_norm_reduction::residual_norm_reduction#9 38484834 37633714
set_all_statuses::set_all_statuses#3 16087 16142
total time 9807684867.0 9406512922.0
cz20468 gmres develop this PR
allocate 1097204 1109565
copy 471172 22697740
csr::advanced_spmv#5 232840 229640
csr::spmv#3 22618884 22106416
dense::add_scaled#3 163565 161941
dense::compute_norm2#2 624088 624450
free 788451 783874
gmres::initialize_1#7 26303 25861
gmres::initialize_2#7 507573 512541
gmres::step_1#11 836047682 831280415
gmres::step_2#6 4858147 4278588
residual_norm_reduction::residual_norm_reduction#9 36507831 36719686
set_all_statuses::set_all_statuses#3 14444 14529
total time 866834322.0 873171915.0
parabolic_fem jacobi-gmres develop this PR
allocate 10844859 13787887
copy 381019 382907
csr::advanced_spmv#5 1419300 1418164
csr::spmv#3 141036683 140871510
dense::add_scaled#3 283208 284785
dense::compute_norm2#2 735460 737568
free 3497251 3528137
gmres::initialize_1#7 37432 36918
gmres::initialize_2#7 7533181 7602786
gmres::step_1#11 9091034882 9031006389
gmres::step_2#6 60536508 48509333
residual_norm_reduction::residual_norm_reduction#9 37389200 37624787
set_all_statuses::set_all_statuses#3 16370 15593
total time 9807684867.0 9406512922.0
cz20468 jacobi-gmres develop this PR
allocate 1155547 1161187
copy 243903 244202
csr::advanced_spmv#5 235821 235956
csr::spmv#3 23194311 23254954
dense::add_scaled#3 161600 163309
dense::compute_norm2#2 635196 631974
free 792385 802053
gmres::initialize_1#7 28055 27730
gmres::initialize_2#7 507887 513957
gmres::step_1#11 834597414 830933371
gmres::step_2#6 4869268 4285080
residual_norm_reduction::residual_norm_reduction#9 36549182 36562928
set_all_statuses::set_all_statuses#3 15265 15998
total time 866834322.0 873171915.0
parabolic_fem adaptive-jacobi-gmres develop this PR
allocate 10857625 10922360
copy 376434 380551
csr::advanced_spmv#5 1420420 1424990
csr::spmv#3 140949268 141389777
dense::add_scaled#3 301866 304616
dense::compute_norm2#2 739890 739050
free 3509403 3521299
gmres::initialize_1#7 38011 37919
gmres::initialize_2#7 7544718 7606785
gmres::step_1#11 9106508986 9120126109
gmres::step_2#6 60212094 48773732
residual_norm_reduction::residual_norm_reduction#9 37586223 37614267
set_all_statuses::set_all_statuses#3 16429 15924
total time 9807684867.0 9406512922.0
cz20468 adaptive-jacobi-gmres develop this PR
allocate 1151909 1150076
copy 242711 244436
csr::advanced_spmv#5 234358 235956
csr::spmv#3 22861060 22806567
dense::add_scaled#3 170263 163035
dense::compute_norm2#2 637091 636317
free 802065 796795
gmres::initialize_1#7 28468 27857
gmres::initialize_2#7 507957 511603
gmres::step_1#11 836117938 831367861
gmres::step_2#6 4863637 4290163
residual_norm_reduction::residual_norm_reduction#9 36552544 36510519
set_all_statuses::set_all_statuses#3 15171 15760
total time 866834322.0 873171915.0

@hartwiganzt
Copy link
Collaborator

I appreciate the very detailed analysis! I assume the "sparser" a matrix is (large, few nnz), the higher is the "negative" impact. But we are talking about 1-3% here, so I am happy with it!

@pratikvn pratikvn added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Mar 17, 2020
@yhmtsai
Copy link
Member

yhmtsai commented Mar 18, 2020

LGTM. Also, total time seems not to change a lot.

@sonarcloud
Copy link

sonarcloud bot commented Mar 18, 2020

SonarCloud Quality Gate failed.

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell B 2 Code Smells

40.0% 40.0% Coverage
4.9% 4.9% Duplication

@fritzgoebel fritzgoebel merged commit 4b43ad2 into develop Mar 18, 2020
@fritzgoebel fritzgoebel deleted the gmres_fix branch March 18, 2020 15:37
@yhmtsai yhmtsai mentioned this pull request Apr 30, 2020
1 task
@tcojean tcojean mentioned this pull request Jun 23, 2020
tcojean added a commit that referenced this pull request Jul 7, 2020
The Ginkgo team is proud to announce the new minor release of Ginkgo version
1.2.0. This release brings full HIP support to Ginkgo, new preconditioners
(ParILUT, ISAI), conversion between double and float for all LinOps, and many
more features and fixes.

Supported systems and requirements:
+ For all platforms, cmake 3.9+
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + clang: 3.9+
  + Intel compiler: 2017+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
  + HIP module: ROCm 2.8+
+ Windows
  + MinGW and CygWin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + Microsoft Visual Studio: VS 2017 15.7+
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or CygWin.


The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues).


# Additions
Here are the main additions to the Ginkgo library. Other thematic additions are listed below.
+ Add full HIP support to Ginkgo [#344](#344), [#357](#357), [#384](#384), [#373](#373), [#391](#391), [#396](#396), [#395](#395), [#393](#393), [#404](#404), [#439](#439), [#443](#443), [#567](#567)
+ Add a new ISAI preconditioner [#489](#489), [#502](#502), [#512](#512), [#508](#508), [#520](#520)
+ Add support for ParILUT and ParICT factorization with ILU preconditioners [#400](#400)
+ Add a new BiCG solver [#438](#438)
+ Add a new permutation matrix format [#352](#352), [#469](#469)
+ Add CSR SpGEMM support [#386](#386), [#398](#398), [#418](#418), [#457](#457)
+ Add CSR SpGEAM support [#556](#556)
+ Make all solvers and preconditioners transposable [#535](#535)
+ Add CsrBuilder and CooBuilder for intrusive access to matrix arrays [#437](#437)
+ Add a standard-compliant allocator based on the Executors [#504](#504)
+ Support conversions for all LinOp between double and float [#521](#521)
+ Add a new boolean to the CUDA and HIP executors to control DeviceReset (default off) [#557](#557)
+ Add a relaxation factor to IR to represent Richardson Relaxation [#574](#574)
+ Add two new stopping criteria, for relative (to `norm(b)`) and absolute residual norm [#577](#577)

### Example additions
+ Templatize all examples to simplify changing the precision [#513](#513)
+ Add a new adaptive precision block-Jacobi example [#507](#507)
+ Add a new IR example [#522](#522)
+ Add a new Mixed Precision Iterative Refinement example [#525](#525)
+ Add a new example on iterative trisolves in ILU preconditioning [#526](#526), [#536](#536), [#550](#550)

### Compilation and library changes
+ Auto-detect compilation settings based on environment [#435](#435), [#537](#537)
+ Add SONAME to shared libraries [#524](#524)
+ Add clang-cuda support [#543](#543)

### Other additions
+ Add sorting, searching and merging kernels for GPUs [#403](#403), [#428](#428), [#417](#417), [#455](#455)
+ Add `gko::as` support for smart pointers [#493](#493)
+ Add setters and getters for criterion factories [#527](#527)
+ Add a new method to check whether a solver uses `x` as an initial guess [#531](#531)
+ Add contribution guidelines [#549](#549)

# Fixes
### Algorithms
+ Improve the classical CSR strategy's performance [#401](#401)
+ Improve the CSR automatical strategy [#407](#407), [#559](#559)
+ Memory, speed improvements to the ELL kernel [#411](#411)
+ Multiple improvements and fixes to ParILU [#419](#419), [#427](#427), [#429](#429), [#456](#456), [#544](#544)
+ Fix multiple issues with GMRES [#481](#481), [#523](#523), [#575](#575)
+ Optimize OpenMP matrix conversions [#505](#505)
+ Ensure the linearity of the ILU preconditioner [#506](#506)
+ Fix IR's use of the advanced apply [#522](#522)
+ Fix empty matrices conversions and add tests [#560](#560)

### Other core functionalities
+ Fix complex number support in our math header [#410](#410)
+ Fix CUDA compatibility of the main ginkgo header [#450](#450)
+ Fix isfinite issues [#465](#465)
+ Fix the Array::view memory leak and the array/view copy/move [#485](#485)
+ Fix typos preventing use of some interface functions [#496](#496)
+ Fix the `gko::dim` to abide to the C++ standard [#498](#498)
+ Simplify the executor copy interface [#516](#516)
+ Optimize intermediate storage for Composition [#540](#540)
+ Provide an initial guess for relevant Compositions [#561](#561)
+ Better management of nullptr as criterion [#562](#562)
+ Fix the norm calculations for complex support [#564](#564)

### CUDA and HIP specific
+ Use the return value of the atomic operations in our wrappers [#405](#405)
+ Improve the portability of warp lane masks [#422](#422)
+ Extract thread ID computation into a separate function [#464](#464)
+ Reorder kernel parameters for consistency [#474](#474)
+ Fix the use of `pragma unroll` in HIP [#492](#492)

### Other
+ Fix the Ginkgo CMake installation files [#414](#414), [#553](#553)
+ Fix the Windows compilation [#415](#415)
+ Always use demangled types in error messages [#434](#434), [#486](#486)
+ Add CUDA header dependency to appropriate tests [#452](#452)
+ Fix several sonarqube or compilation warnings [#453](#453), [#463](#463), [#532](#532), [#569](#569)
+ Add shuffle tests [#460](#460)
+ Fix MSVC C2398 error [#490](#490)
+ Fix missing interface tests in test install [#558](#558)

# Tools and ecosystem
### Benchmarks
+ Add better norm support in the benchmarks [#377](#377)
+ Add CUDA 10.1 generic SpMV support in benchmarks [#468](#468), [#473](#473)
+ Add sparse library ILU in benchmarks [#487](#487)
+ Add overhead benchmarking capacities [#501](#501)
+ Allow benchmarking from a matrix list file [#503](#503)
+ Fix benchmarking issue with JSON and non-finite numbers [#514](#514)
+ Fix benchmark logger crashers with OpenMP [#565](#565)

### CI related
+ Improvements to the CI setup with HIP compilation [#421](#421), [#466](#466)
+ Add MacOSX CI support [#470](#470), [#488](#488)
+ Add Windows CI support [#471](#471), [#488](#488), [#510](#510), [#566](#566)
+ Use sanitizers instead of valgrind [#476](#476)
+ Add automatic container generation and update facilities [#499](#499)
+ Fix the CI parallelism settings [#517](#517), [#538](#538), [#539](#539)
+ Make the codecov patch check informational [#519](#519)
+ Add support for LLVM sanitizers with improved thread sanitizer support [#578](#578)

### Test suite
+ Add an assertion for sparsity pattern equality [#416](#416)
+ Add core and reference multiprecision tests support [#448](#448)
+ Speed up GPU tests by avoiding device reset [#467](#467)
+ Change test matrix location string [#494](#494)

### Other
+ Add Ginkgo badges from our tools [#413](#413)
+ Update the `create_new_algorithm.sh` script [#420](#420)
+ Bump copyright and improve license management [#436](#436), [#433](#433)
+ Set clang-format minimum requirement [#441](#441), [#484](#484)
+ Update git-cmake-format [#446](#446), [#484](#484)
+ Disable the development tools by default [#442](#442)
+ Add a script for automatic header formatting [#447](#447)
+ Add GDB pretty printer for `gko::Array` [#509](#509)
+ Improve compilation speed [#533](#533)
+ Add editorconfig support [#546](#546)
+ Add a compile-time check for header self-sufficiency [#552](#552)


# Related PR: #583
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. is:affects-performance This is related to something which affects performance. is:enhancement An improvement of an existing feature. mod:core This is related to the core module. type:solver This is related to the solvers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants