Allow benchmark to be compiled with different precisions #675

thoasm · 2020-12-07T13:29:45Z

This PR exports the precision all benchmark code is compiled in into the file benchmark/utils/types.hpp.
The etype defines in which precision the benchmark code is compiled in, meaning a change to using etype = float; runs the benchmark for single precision (see this pipeline for this case).

TODO:

Generate 2 binaries: one for float, one for double
Add remove_complex to a lot of function return types and test benchmark with complex values

tcojean

I think LGTM though I did not review in depth. I have a few comments for now.

benchmark/utils/general.hpp

benchmark/utils/cuda_linops.hpp

codecov · 2020-12-07T22:53:50Z

Codecov Report

Merging #675 (b1fa08d) into develop (07f6478) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop     #675      +/-   ##
===========================================
- Coverage    93.01%   93.01%   -0.01%     
===========================================
  Files          337      337              
  Lines        24876    24877       +1     
===========================================
  Hits         23139    23139              
- Misses        1737     1738       +1

Impacted Files	Coverage Δ
include/ginkgo/core/base/matrix_data.hpp	`97.85% <0.00%> (-1.42%)`	⬇️
omp/reorder/rcm_kernels.cpp	`98.13% <0.00%> (+0.60%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7284619...b1fa08d. Read the comment docs.

Slaedr

LGTM! A few minor questions and comments.

benchmark/run_all_benchmarks.sh

benchmark/utils/general.hpp

benchmark/solver/solver.cpp

benchmark/run_all_benchmarks.sh

sonarcloud · 2020-12-15T18:37:14Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
14 Code Smells

0.0% Coverage
9.8% Duplication

thoasm · 2021-01-14T16:43:54Z

format!

thoasm · 2021-01-15T04:51:49Z

format!

upsj

LGTM! Did you test the execution with all precision types?

tcojean

LGTM.

benchmark/CMakeLists.txt

thoasm · 2021-01-21T13:48:19Z

@upsj Yes, I ran a very small benchmark with all preconditioner and solver, and did not encounter any runtime issues.
However, I realized that I did not call the correct solver-binary in the run_all_benchmarks.sh script (I am sure I did when I performed all the tests, but it did get lost at some point). I just pushed the correct benchmark script (I also tested all precisions again without any issues).

Note: benchmark is used with `float` in this commit to test it on CI

Every benchmark is build with multiple precisions (currently double and single precision), each generating its own binary. The "run_all_benchmarks.sh" script additionally got a variable deciding which precision to choose (which binary to execute).

Benchmark binaries are now always generated for single, double, single complex and double complex.

Update message for supported precisions. Additionally, improve coding for BENCHMARK_PRECISION default case Co-authored-by: Aditya Kashi <[email protected]>

Co-authored-by: Thomas Grützmacher <[email protected]>

Co-authored-by: tcojean <[email protected]>

sonarcloud · 2021-01-22T16:09:50Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities
0 Security Hotspots
3 Code Smells

0.0% Coverage
12.4% Duplication

Ginkgo release 1.4.0 The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem which enables Intel-GPU and CPU execution. The only Ginkgo features which have not been ported yet are some preconditioners. Ginkgo's mixed-precision support is greatly enhanced thanks to: 1. The new Accessor concept, which allows writing kernels featuring on-the-fly memory compression, among other features. The accessor can be used as header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example. 2. All LinOps now transparently support mixed-precision execution. By default, this is done through a temporary copy which may have a performance impact but already allows mixed-precision research. Native mixed-precision ELL kernels are implemented which do not see this cost. The accessor is also leveraged in a new CB-GMRES solver which allows for performance improvements by compressing the Krylov basis vectors. Many other features have been added to Ginkgo, such as reordering support, a new IDR solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU for now), machine topology information, and more! Supported systems and requirements: + For all platforms, cmake 3.13+ + C++14 compliant compiler + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 3.5+ + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add a new DPC++ Executor for SYCL execution and other base utilities [#648](#648), [#661](#661), [#757](#757), [#832](#832) + Port matrix formats, solvers and related kernels to DPC++. For some kernels, also make use of a shared kernel implementation for all executors (except Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856) + Add accessors which allow multi-precision kernels, among other things. [#643](#643), [#708](#708) + Add support for mixed precision operations through apply in all LinOps. [#677](#677) + Add incomplete Cholesky factorizations and preconditioners as well as some improvements to ILU. [#672](#672), [#837](#837), [#846](#846) + Add an AMGX implementation and kernels on all devices but DPC++. [#528](#528), [#695](#695), [#860](#860) + Add a new mixed-precision capability solver, Compressed Basis GMRES (CB-GMRES). [#693](#693), [#763](#763) + Add the IDR(s) solver. [#620](#620) + Add a new fixed-size block CSR matrix format (for the Reference executor). [#671](#671), [#730](#730) + Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780) + Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649) + Add matrix assembly support on CPUs. [#644](#644) + Extends ISAI from triangular to general and spd matrices. [#690](#690) Other additions: + Add the possibility to apply real matrices to complex vectors. [#655](#655), [#658](#658) + Add functions to compute the absolute of a matrix format. [#636](#636) + Add symmetric permutation and improve existing permutations. [#684](#684), [#657](#657), [#663](#663) + Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697) + Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850) + Row-major accessor is generalized to more than 2 dimensions and a new "block column-major" accessor has been added. [#707](#707) + Add an heat equation example. [#698](#698), [#706](#706) + Add ccache support in CMake and CI. [#725](#725), [#739](#739) + Allow tuning and benchmarking variables non intrusively. [#692](#692) + Add triangular solver benchmark [#664](#664) + Add benchmarks for BLAS operations [#772](#772), [#829](#829) + Add support for different precisions and consistent index types in benchmarks. [#675](#675), [#828](#828) + Add a Github bot system to facilitate development and PR management. [#667](#667), [#674](#674), [#689](#689), [#853](#853) + Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781) + Add ssh debugging for Github Actions CI. [#749](#749) + Add pipeline segmentation for better CI speed. [#737](#737) Changes: + Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854) + Add implicit residual log for solvers and benchmarks. [#714](#714) + Change handling of the conjugate in the dense dot product. [#755](#755) + Improved Dense stride handling. [#774](#774) + Multiple improvements to the OpenMP kernels performance, including COO, an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740) + Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718) + Improved Identity constructor and treatment of rectangular matrices. [#646](#646) + Allow CUDA/HIP executors to select allocation mode. [#758](#758) + Check if executors share the same memory. [#670](#670) + Improve test install and smoke testing support. [#721](#721) + Update the JOSS paper citation and add publications in the documentation. [#629](#629), [#724](#724) + Improve the version output. [#806](#806) + Add some utilities for dim and span. [#821](#821) + Improved solver and preconditioner benchmarks. [#660](#660) + Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812) Fixes: + Sorting fix for the Jacobi preconditioner. [#659](#659) + Also log the first residual norm in CGS [#735](#735) + Fix BiCG and HIP CSR to work with complex matrices. [#651](#651) + Fix Coo SpMV on strided vectors. [#807](#807) + Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769) + Fix device_reset issue by moving counter/mutex to device. [#810](#810) + Fix `EnableLogging` superclass. [#841](#841) + Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726) + Decreased test size for a few device tests. [#742](#742) + Fix multiple issues with our CMake HIP and RPATH setup. [#712](#712), [#745](#745), [#709](#709) + Cleanup our CMake installation step. [#713](#713) + Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785) + Simplify third-party integration. [#786](#786) + Improve Ginkgo device arch flags management. [#696](#696) + Other fixes and improvements to the CMake setup. [#685](#685), [#792](#792), [#705](#705), [#836](#836) + Clarification of dense norm documentation [#784](#784) + Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840) + Make multiple operators/constructors explicit. [#650](#650), [#761](#761) + Fix some issues, memory leaks and warnings found by MSVC. [#666](#666), [#731](#731) + Improved solver memory estimates and consistent iteration counts [#691](#691) + Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754) + Fix for ForwardIterator requirements in iterator_factory. [#665](#665) + Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722) + Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852) Related PR: #857

Release 1.4.0 to master The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem which enables Intel-GPU and CPU execution. The only Ginkgo features which have not been ported yet are some preconditioners. Ginkgo's mixed-precision support is greatly enhanced thanks to: 1. The new Accessor concept, which allows writing kernels featuring on-the-fly memory compression, among other features. The accessor can be used as header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example. 2. All LinOps now transparently support mixed-precision execution. By default, this is done through a temporary copy which may have a performance impact but already allows mixed-precision research. Native mixed-precision ELL kernels are implemented which do not see this cost. The accessor is also leveraged in a new CB-GMRES solver which allows for performance improvements by compressing the Krylov basis vectors. Many other features have been added to Ginkgo, such as reordering support, a new IDR solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU for now), machine topology information, and more! Supported systems and requirements: + For all platforms, cmake 3.13+ + C++14 compliant compiler + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 3.5+ + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add a new DPC++ Executor for SYCL execution and other base utilities [#648](#648), [#661](#661), [#757](#757), [#832](#832) + Port matrix formats, solvers and related kernels to DPC++. For some kernels, also make use of a shared kernel implementation for all executors (except Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856) + Add accessors which allow multi-precision kernels, among other things. [#643](#643), [#708](#708) + Add support for mixed precision operations through apply in all LinOps. [#677](#677) + Add incomplete Cholesky factorizations and preconditioners as well as some improvements to ILU. [#672](#672), [#837](#837), [#846](#846) + Add an AMGX implementation and kernels on all devices but DPC++. [#528](#528), [#695](#695), [#860](#860) + Add a new mixed-precision capability solver, Compressed Basis GMRES (CB-GMRES). [#693](#693), [#763](#763) + Add the IDR(s) solver. [#620](#620) + Add a new fixed-size block CSR matrix format (for the Reference executor). [#671](#671), [#730](#730) + Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780) + Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649) + Add matrix assembly support on CPUs. [#644](#644) + Extends ISAI from triangular to general and spd matrices. [#690](#690) Other additions: + Add the possibility to apply real matrices to complex vectors. [#655](#655), [#658](#658) + Add functions to compute the absolute of a matrix format. [#636](#636) + Add symmetric permutation and improve existing permutations. [#684](#684), [#657](#657), [#663](#663) + Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697) + Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850) + Row-major accessor is generalized to more than 2 dimensions and a new "block column-major" accessor has been added. [#707](#707) + Add an heat equation example. [#698](#698), [#706](#706) + Add ccache support in CMake and CI. [#725](#725), [#739](#739) + Allow tuning and benchmarking variables non intrusively. [#692](#692) + Add triangular solver benchmark [#664](#664) + Add benchmarks for BLAS operations [#772](#772), [#829](#829) + Add support for different precisions and consistent index types in benchmarks. [#675](#675), [#828](#828) + Add a Github bot system to facilitate development and PR management. [#667](#667), [#674](#674), [#689](#689), [#853](#853) + Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781) + Add ssh debugging for Github Actions CI. [#749](#749) + Add pipeline segmentation for better CI speed. [#737](#737) Changes: + Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854) + Add implicit residual log for solvers and benchmarks. [#714](#714) + Change handling of the conjugate in the dense dot product. [#755](#755) + Improved Dense stride handling. [#774](#774) + Multiple improvements to the OpenMP kernels performance, including COO, an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740) + Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718) + Improved Identity constructor and treatment of rectangular matrices. [#646](#646) + Allow CUDA/HIP executors to select allocation mode. [#758](#758) + Check if executors share the same memory. [#670](#670) + Improve test install and smoke testing support. [#721](#721) + Update the JOSS paper citation and add publications in the documentation. [#629](#629), [#724](#724) + Improve the version output. [#806](#806) + Add some utilities for dim and span. [#821](#821) + Improved solver and preconditioner benchmarks. [#660](#660) + Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812) Fixes: + Sorting fix for the Jacobi preconditioner. [#659](#659) + Also log the first residual norm in CGS [#735](#735) + Fix BiCG and HIP CSR to work with complex matrices. [#651](#651) + Fix Coo SpMV on strided vectors. [#807](#807) + Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769) + Fix device_reset issue by moving counter/mutex to device. [#810](#810) + Fix `EnableLogging` superclass. [#841](#841) + Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726) + Decreased test size for a few device tests. [#742](#742) + Fix multiple issues with our CMake HIP and RPATH setup. [#712](#712), [#745](#745), [#709](#709) + Cleanup our CMake installation step. [#713](#713) + Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785) + Simplify third-party integration. [#786](#786) + Improve Ginkgo device arch flags management. [#696](#696) + Other fixes and improvements to the CMake setup. [#685](#685), [#792](#792), [#705](#705), [#836](#836) + Clarification of dense norm documentation [#784](#784) + Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840) + Make multiple operators/constructors explicit. [#650](#650), [#761](#761) + Fix some issues, memory leaks and warnings found by MSVC. [#666](#666), [#731](#731) + Improved solver memory estimates and consistent iteration counts [#691](#691) + Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754) + Fix for ForwardIterator requirements in iterator_factory. [#665](#665) + Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722) + Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852) Related PR: #866

thoasm added reg:benchmarking This is related to benchmarking. 1:ST:ready-for-review This PR is ready for review labels Dec 7, 2020

thoasm requested review from upsj, pratikvn, Slaedr, yhmtsai, tcojean and fritzgoebel December 7, 2020 13:29

thoasm self-assigned this Dec 7, 2020

ginkgo-bot added type:preconditioner This is related to the preconditioners type:solver This is related to the solvers labels Dec 7, 2020

tcojean reviewed Dec 7, 2020

View reviewed changes

benchmark/utils/general.hpp Outdated Show resolved Hide resolved

benchmark/utils/cuda_linops.hpp Show resolved Hide resolved

thoasm added 1:ST:WIP This PR is a work in progress. Not ready for review. and removed 1:ST:ready-for-review This PR is ready for review labels Dec 9, 2020

thoasm force-pushed the change_benchmark_precision branch 3 times, most recently from 396788c to 64a009b Compare December 11, 2020 10:15

Slaedr approved these changes Dec 14, 2020

View reviewed changes

benchmark/run_all_benchmarks.sh Outdated Show resolved Hide resolved

benchmark/utils/general.hpp Outdated Show resolved Hide resolved

benchmark/solver/solver.cpp Outdated Show resolved Hide resolved

benchmark/run_all_benchmarks.sh Show resolved Hide resolved

thoasm force-pushed the change_benchmark_precision branch from 64a009b to 61cecab Compare December 15, 2020 11:22

thoasm force-pushed the change_benchmark_precision branch from 61cecab to fe8f438 Compare January 14, 2021 16:42

thoasm added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Jan 14, 2021

thoasm force-pushed the change_benchmark_precision branch from 69eca3c to 0bba35d Compare January 14, 2021 18:20

upsj approved these changes Jan 20, 2021

View reviewed changes

upsj added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Jan 20, 2021

tcojean approved these changes Jan 20, 2021

View reviewed changes

benchmark/CMakeLists.txt Outdated Show resolved Hide resolved

Thomas Grützmacher and others added 10 commits January 22, 2021 11:19

Enable benchmark to run with different precisions

0dd4ae2

Note: benchmark is used with `float` in this commit to test it on CI

Change benchmark precision to double again

a313639

Make benchmark compatible with complex numbers

611e823

Add support for complex types in benchmark script

8677e7e

Benchmark binaries are now always generated for single, double, single complex and double complex.

Review update

0da2745

Update message for supported precisions. Additionally, improve coding for BENCHMARK_PRECISION default case Co-authored-by: Aditya Kashi <[email protected]>

Benchmark: print res_goal in scientific notation

78df2ee

Format files

24b5334

Co-authored-by: Thomas Grützmacher <[email protected]>

Review update

9944b78

Co-authored-by: tcojean <[email protected]>

Update benchmark script to run correct precision

b1fa08d

thoasm force-pushed the change_benchmark_precision branch from 7db51f3 to b1fa08d Compare January 22, 2021 10:20

thoasm merged commit c5f0e48 into develop Jan 23, 2021

thoasm deleted the change_benchmark_precision branch January 23, 2021 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow benchmark to be compiled with different precisions #675

Allow benchmark to be compiled with different precisions #675

thoasm commented Dec 7, 2020 •

edited

Loading

tcojean left a comment

codecov bot commented Dec 7, 2020 •

edited

Loading

Slaedr left a comment

sonarcloud bot commented Dec 15, 2020

thoasm commented Jan 14, 2021

thoasm commented Jan 15, 2021

upsj left a comment

tcojean left a comment

thoasm commented Jan 21, 2021

sonarcloud bot commented Jan 22, 2021

Allow benchmark to be compiled with different precisions #675

Allow benchmark to be compiled with different precisions #675

Conversation

thoasm commented Dec 7, 2020 • edited Loading

tcojean left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 7, 2020 • edited Loading

Codecov Report

Slaedr left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Dec 15, 2020

thoasm commented Jan 14, 2021

thoasm commented Jan 15, 2021

upsj left a comment

Choose a reason for hiding this comment

tcojean left a comment

Choose a reason for hiding this comment

thoasm commented Jan 21, 2021

sonarcloud bot commented Jan 22, 2021

thoasm commented Dec 7, 2020 •

edited

Loading

codecov bot commented Dec 7, 2020 •

edited

Loading