Support Mixed precision SpMV and BLAS operations #677

upsj · 2020-12-17T02:58:27Z

This PR adds full mixed precision support to Ginkgo, at least in terms of LinOp compatibility.
By default, this happens with the new make_temporary_conversion helper that works similarly to make_temporary_clone and is wrapped in the precision_dispatch function, which applies the correct precision and complex-to-real conversions for SpMV and preconditioner applications. For solvers, a more explicit conversion is necessary.

The temporary conversion wrapper won't give ideal performance, since it requires additional conversions from and to the input/output vectors as well as associated allocations of temporary memory for each apply operation. For solvers, this might still pay off due to the long runtime of a single apply. Mixed Precision IR is then almost equivalent to the following

auto solver =
    gko::solver::Ir<double>::build()
        .with_solver(
            gko::solver::Gmres<float>::build()
                .with_criteria(
                    gko::stop::ResidualNormReduction<SolverType>::build()
                        .with_reduction_factor(inner_reduction_factor)
                        .on(exec),
                    gko::stop::Iteration::build()
                        .with_max_iters(max_inner_iters)
                        .on(exec))
                .on(exec))
        .with_criteria(gko::stop::ResidualNormReduction<ValueType>::build()
                            .with_reduction_factor(outer_reduction_factor)
                            .on(exec),
                        gko::stop::Iteration::build()
                            .with_max_iters(max_outer_iters)
                            .on(exec))
        .on(exec)
        ->generate(give(A));

except for the fact that A is always stored and computed on in double precision.

TODO:

Comprehensive Reference Tests
~~No need to convert x to ValueType for LinOps where apply_uses_initial_guess() == false~~ This would be way too complex and probably be overkill, since this is not meant to provide good performance
~~Modify MPIR example~~

tcojean

There is a few mistakes I could find seemingly due to copy/paste. Of course tests are missing. In addition, I think you should change the mixed precision IR example to reflect the new structure, or add an extra example/case in the same example?

Another question, can anything be done about ISAI? I think changing the TRS might be enough for ILU and IC, but ISAI might need some more changes particularly since it stores the improximate inverse as a plain CSR matrix?

core/base/combination.cpp

core/base/perturbation.cpp

core/base/temporary_conversion.hpp

tcojean · 2021-03-08T13:35:32Z

core/matrix/diagonal.cpp

 } else {
- GKO_NOT_IMPLEMENTED;
+ precision_dispatch_spmv<ValueType>(
+ [&](auto dense_b, auto dense_x) {
+ exec->run(
+ diagonal::make_apply_to_dense(this, dense_b, dense_x));
+ },
+ b, x);
 }


In terms of functionality that should work as all formats can convert to Dense, but do we want that to happen, what about other formats? We could keep a GKO_NOT_IMPLEMENTED as well.

I guess the new form corresponds more to what is in other cases like for CSR where we assume all other cases to be dense.

I prefer staying GKO_NOT_IMPLEMENTED. Converting to dense may need a lot of storage

True, I hadn't even considered the sparse -> dense conversions here. So I will instead have to test against all Dense types instead, instead of ConvertibleTo

I agree, diagonal to dense might not be something one would want to do.

I actually misremembered my implementation, I never try to cast to ConvertibleTo in make_temporary_conversion, only to Dense directly, so this is not an issue.

What I mean here is that when someone passes a Csr type or any other form of LinOp instead of a Dense, before we would got an exception thrown here GKO_NOT_IMPLEMENTED. Now what would happen, in what way would it fail, it's actually not clear looking at the code of make_temporary_conversion.

Exceptions are part of the interface AFAIK?

Before my last commit, it would have thrown NotSupported in make_temporary_conversion. Now with the new changes in place, it would throw NotSupported in conversion_helper:::convert, but now that I think of it, the previous version might actually be better, together with a make_temporary_conversion/precision_dispatch_nothrow that just returns nullptr or something like that in case of an error.

core/preconditioner/jacobi.cpp

core/solver/bicg.cpp

core/base/precision_dispatch.hpp

include/ginkgo/core/solver/idr.hpp

codecov · 2021-03-08T16:56:37Z

Codecov Report

Merging #677 (534a02a) into develop (3ab51db) will increase coverage by 0.21%.
The diff coverage is 96.88%.

@@             Coverage Diff             @@
##           develop     #677      +/-   ##
===========================================
+ Coverage    92.56%   92.78%   +0.21%     
===========================================
  Files          389      392       +3     
  Lines        29220    30408    +1188     
===========================================
+ Hits         27047    28213    +1166     
- Misses        2173     2195      +22

Impacted Files	Coverage Δ
include/ginkgo/core/solver/bicg.hpp	`100.00% <ø> (ø)`
include/ginkgo/core/solver/bicgstab.hpp	`100.00% <ø> (ø)`
include/ginkgo/core/solver/cb_gmres.hpp	`100.00% <ø> (ø)`
include/ginkgo/core/solver/cg.hpp	`100.00% <ø> (ø)`
include/ginkgo/core/solver/cgs.hpp	`100.00% <ø> (ø)`
include/ginkgo/core/solver/fcg.hpp	`100.00% <ø> (ø)`
include/ginkgo/core/solver/gmres.hpp	`100.00% <ø> (ø)`
include/ginkgo/core/solver/idr.hpp	`100.00% <ø> (ø)`
include/ginkgo/core/solver/ir.hpp	`100.00% <ø> (ø)`
omp/solver/cb_gmres_kernels.cpp	`78.48% <ø> (ø)`
... and 64 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ab51db...534a02a. Read the comment docs.

core/base/temporary_conversion.hpp

fritzgoebel · 2021-03-12T12:22:07Z

core/matrix/diagonal.cpp

 } else {
- GKO_NOT_IMPLEMENTED;
+ precision_dispatch_spmv<ValueType>(
+ [&](auto dense_b, auto dense_x) {
+ exec->run(
+ diagonal::make_apply_to_dense(this, dense_b, dense_x));
+ },
+ b, x);
 }


I agree, diagonal to dense might not be something one would want to do.

core/solver/bicg.cpp

upsj · 2021-03-20T09:24:28Z

@tcojean I am a bit unsure about the MPIR example, since modifying it to use our mixed precision support would not be 100% equivalent: MPIR uses the matrix stored in float for the inner solves, but in double for the outer solves. So either we use the matrix in float, which gives us only float-level precision in the overall solve, or we store it in double, which uses "too precise" SpMVs in the inner solver.

tcojean · 2021-03-24T10:20:26Z

For the example if it's not exactly the same effect as the old one, I understand it makes sense to keep the old one. The next question then is whether we need a new example to advertise these new features/help people play with it in a simple fashion. I also like Mike's idea of an apply(Dense, Dense) as that would isolate the dispatching into one place and put relevant (lengthy) implementation in another place. Otherwise, LGTM (minus leftover compilation issues).

pratikvn · 2021-03-29T12:16:09Z

In terms of performance, can we make sure that the base case of no mixed precision still has the same performance as before (in current develop) ? All the applies are now wrapped by the precision_dispatch and lambdas, which is a major change. Can you maybe quickly run some benchmarks for some small matrices for all the applies ? I know that it shouldn't affect the performance, but I think it is better to be sure.

upsj · 2021-03-29T13:01:16Z

@pratikvn Good point, I ran a small benchmark (ani4.mtx with CG on reference) to get some performance numbers:
Before 0.03631952699999999
After 0.03605373300000001
So I don't see any overhead at all from 100 repetitions of the solve benchmarks.
Looking at the overall code, this is not too surprising to me, since the only substantial changes happen in the code path that would previously fail (wrong Dense value type), everything else is almost equivalent to what we did previously with gko::as

tcojean

LGTM in general. A minor comment on variable naming.

One more important issue, I think I would really prefer if we used @yhmtsai 's idea of apply(Dense*, Dense*) such that the all form of dispatch code is isolated from the lengthy implementation. That would I think make the code much clearer and also hopefully help reduce the amount of code changes as well.

reference/test/matrix/coo_kernels.cpp

* add missing conjugation to CBGMRES * work around complex accessor issues Co-authored-by: Thomas Grützmacher <[email protected]>

tcojean

LGTM.

core/solver/cb_gmres.cpp

fritzgoebel

LGTM!

Co-authored-by: Terry Cojean <[email protected]>

Slaedr

Great work! That is quite elegant. After seeing the code for the solver apply etc. I realize how good this is.

I think I found some bugs in a couple of tests. I also have a few minor clarifications.

I wonder if it makes sense to write a test to ensure that const arguments are not copied back, for performance reasons. Maybe what you could do is, in one test, write a dummy operator templated on value_type with an apply(const LinOp *x, LinOp *y). Inside the apply, const_cast the x and change it. Outside, ensure that x was not modified. It might be nice to have such a test in case the conversion_helper etc. need to be modified in future.

Also, it would be nice to have the new tests that you added to one of the objects, maybe CSR matrix kernels, to the other backends as well. Just the CSR tests would be enough, I think, and would add almost nothing to the testing time and we could be sure all this works on the other backends too.

include/ginkgo/core/base/temporary_conversion.hpp

include/ginkgo/core/base/precision_dispatch.hpp

reference/test/base/combination.cpp

reference/test/matrix/sellp_kernels.cpp

reference/test/matrix/sparsity_csr_kernels.cpp

upsj · 2021-03-30T20:33:36Z

@Slaedr Thanks, all of those are really good suggestions, I incorporated all of them

Slaedr

LGTM! Just one set of small nits about the tolerance.

cuda/test/matrix/dense_kernels.cpp

hip/test/matrix/dense_kernels.hip.cpp

omp/test/matrix/dense_kernels.cpp

* fix missing documentation * fix mixed precision tests * add tests for device temporary conversion * add tests for make_temporary_conversion behavior Co-authored-by: Aditya Kashi <[email protected]>

upsj · 2021-03-31T08:07:57Z

@Slaedr Good catch! This was actually an accident, since our implementations seems to be bitwise equivalent, so we could even use zero tolerance. We're not comparing against a "ground truth", but two equally inexact implementations

sonarcloud · 2021-03-31T16:55:01Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
30 Code Smells

78.8% Coverage
0.1% Duplication

Ginkgo release 1.4.0 The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem which enables Intel-GPU and CPU execution. The only Ginkgo features which have not been ported yet are some preconditioners. Ginkgo's mixed-precision support is greatly enhanced thanks to: 1. The new Accessor concept, which allows writing kernels featuring on-the-fly memory compression, among other features. The accessor can be used as header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example. 2. All LinOps now transparently support mixed-precision execution. By default, this is done through a temporary copy which may have a performance impact but already allows mixed-precision research. Native mixed-precision ELL kernels are implemented which do not see this cost. The accessor is also leveraged in a new CB-GMRES solver which allows for performance improvements by compressing the Krylov basis vectors. Many other features have been added to Ginkgo, such as reordering support, a new IDR solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU for now), machine topology information, and more! Supported systems and requirements: + For all platforms, cmake 3.13+ + C++14 compliant compiler + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 3.5+ + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add a new DPC++ Executor for SYCL execution and other base utilities [#648](#648), [#661](#661), [#757](#757), [#832](#832) + Port matrix formats, solvers and related kernels to DPC++. For some kernels, also make use of a shared kernel implementation for all executors (except Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856) + Add accessors which allow multi-precision kernels, among other things. [#643](#643), [#708](#708) + Add support for mixed precision operations through apply in all LinOps. [#677](#677) + Add incomplete Cholesky factorizations and preconditioners as well as some improvements to ILU. [#672](#672), [#837](#837), [#846](#846) + Add an AMGX implementation and kernels on all devices but DPC++. [#528](#528), [#695](#695), [#860](#860) + Add a new mixed-precision capability solver, Compressed Basis GMRES (CB-GMRES). [#693](#693), [#763](#763) + Add the IDR(s) solver. [#620](#620) + Add a new fixed-size block CSR matrix format (for the Reference executor). [#671](#671), [#730](#730) + Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780) + Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649) + Add matrix assembly support on CPUs. [#644](#644) + Extends ISAI from triangular to general and spd matrices. [#690](#690) Other additions: + Add the possibility to apply real matrices to complex vectors. [#655](#655), [#658](#658) + Add functions to compute the absolute of a matrix format. [#636](#636) + Add symmetric permutation and improve existing permutations. [#684](#684), [#657](#657), [#663](#663) + Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697) + Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850) + Row-major accessor is generalized to more than 2 dimensions and a new "block column-major" accessor has been added. [#707](#707) + Add an heat equation example. [#698](#698), [#706](#706) + Add ccache support in CMake and CI. [#725](#725), [#739](#739) + Allow tuning and benchmarking variables non intrusively. [#692](#692) + Add triangular solver benchmark [#664](#664) + Add benchmarks for BLAS operations [#772](#772), [#829](#829) + Add support for different precisions and consistent index types in benchmarks. [#675](#675), [#828](#828) + Add a Github bot system to facilitate development and PR management. [#667](#667), [#674](#674), [#689](#689), [#853](#853) + Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781) + Add ssh debugging for Github Actions CI. [#749](#749) + Add pipeline segmentation for better CI speed. [#737](#737) Changes: + Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854) + Add implicit residual log for solvers and benchmarks. [#714](#714) + Change handling of the conjugate in the dense dot product. [#755](#755) + Improved Dense stride handling. [#774](#774) + Multiple improvements to the OpenMP kernels performance, including COO, an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740) + Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718) + Improved Identity constructor and treatment of rectangular matrices. [#646](#646) + Allow CUDA/HIP executors to select allocation mode. [#758](#758) + Check if executors share the same memory. [#670](#670) + Improve test install and smoke testing support. [#721](#721) + Update the JOSS paper citation and add publications in the documentation. [#629](#629), [#724](#724) + Improve the version output. [#806](#806) + Add some utilities for dim and span. [#821](#821) + Improved solver and preconditioner benchmarks. [#660](#660) + Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812) Fixes: + Sorting fix for the Jacobi preconditioner. [#659](#659) + Also log the first residual norm in CGS [#735](#735) + Fix BiCG and HIP CSR to work with complex matrices. [#651](#651) + Fix Coo SpMV on strided vectors. [#807](#807) + Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769) + Fix device_reset issue by moving counter/mutex to device. [#810](#810) + Fix `EnableLogging` superclass. [#841](#841) + Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726) + Decreased test size for a few device tests. [#742](#742) + Fix multiple issues with our CMake HIP and RPATH setup. [#712](#712), [#745](#745), [#709](#709) + Cleanup our CMake installation step. [#713](#713) + Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785) + Simplify third-party integration. [#786](#786) + Improve Ginkgo device arch flags management. [#696](#696) + Other fixes and improvements to the CMake setup. [#685](#685), [#792](#792), [#705](#705), [#836](#836) + Clarification of dense norm documentation [#784](#784) + Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840) + Make multiple operators/constructors explicit. [#650](#650), [#761](#761) + Fix some issues, memory leaks and warnings found by MSVC. [#666](#666), [#731](#731) + Improved solver memory estimates and consistent iteration counts [#691](#691) + Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754) + Fix for ForwardIterator requirements in iterator_factory. [#665](#665) + Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722) + Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852) Related PR: #857

Release 1.4.0 to master The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem which enables Intel-GPU and CPU execution. The only Ginkgo features which have not been ported yet are some preconditioners. Ginkgo's mixed-precision support is greatly enhanced thanks to: 1. The new Accessor concept, which allows writing kernels featuring on-the-fly memory compression, among other features. The accessor can be used as header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example. 2. All LinOps now transparently support mixed-precision execution. By default, this is done through a temporary copy which may have a performance impact but already allows mixed-precision research. Native mixed-precision ELL kernels are implemented which do not see this cost. The accessor is also leveraged in a new CB-GMRES solver which allows for performance improvements by compressing the Krylov basis vectors. Many other features have been added to Ginkgo, such as reordering support, a new IDR solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU for now), machine topology information, and more! Supported systems and requirements: + For all platforms, cmake 3.13+ + C++14 compliant compiler + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 3.5+ + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add a new DPC++ Executor for SYCL execution and other base utilities [#648](#648), [#661](#661), [#757](#757), [#832](#832) + Port matrix formats, solvers and related kernels to DPC++. For some kernels, also make use of a shared kernel implementation for all executors (except Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856) + Add accessors which allow multi-precision kernels, among other things. [#643](#643), [#708](#708) + Add support for mixed precision operations through apply in all LinOps. [#677](#677) + Add incomplete Cholesky factorizations and preconditioners as well as some improvements to ILU. [#672](#672), [#837](#837), [#846](#846) + Add an AMGX implementation and kernels on all devices but DPC++. [#528](#528), [#695](#695), [#860](#860) + Add a new mixed-precision capability solver, Compressed Basis GMRES (CB-GMRES). [#693](#693), [#763](#763) + Add the IDR(s) solver. [#620](#620) + Add a new fixed-size block CSR matrix format (for the Reference executor). [#671](#671), [#730](#730) + Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780) + Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649) + Add matrix assembly support on CPUs. [#644](#644) + Extends ISAI from triangular to general and spd matrices. [#690](#690) Other additions: + Add the possibility to apply real matrices to complex vectors. [#655](#655), [#658](#658) + Add functions to compute the absolute of a matrix format. [#636](#636) + Add symmetric permutation and improve existing permutations. [#684](#684), [#657](#657), [#663](#663) + Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697) + Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850) + Row-major accessor is generalized to more than 2 dimensions and a new "block column-major" accessor has been added. [#707](#707) + Add an heat equation example. [#698](#698), [#706](#706) + Add ccache support in CMake and CI. [#725](#725), [#739](#739) + Allow tuning and benchmarking variables non intrusively. [#692](#692) + Add triangular solver benchmark [#664](#664) + Add benchmarks for BLAS operations [#772](#772), [#829](#829) + Add support for different precisions and consistent index types in benchmarks. [#675](#675), [#828](#828) + Add a Github bot system to facilitate development and PR management. [#667](#667), [#674](#674), [#689](#689), [#853](#853) + Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781) + Add ssh debugging for Github Actions CI. [#749](#749) + Add pipeline segmentation for better CI speed. [#737](#737) Changes: + Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854) + Add implicit residual log for solvers and benchmarks. [#714](#714) + Change handling of the conjugate in the dense dot product. [#755](#755) + Improved Dense stride handling. [#774](#774) + Multiple improvements to the OpenMP kernels performance, including COO, an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740) + Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718) + Improved Identity constructor and treatment of rectangular matrices. [#646](#646) + Allow CUDA/HIP executors to select allocation mode. [#758](#758) + Check if executors share the same memory. [#670](#670) + Improve test install and smoke testing support. [#721](#721) + Update the JOSS paper citation and add publications in the documentation. [#629](#629), [#724](#724) + Improve the version output. [#806](#806) + Add some utilities for dim and span. [#821](#821) + Improved solver and preconditioner benchmarks. [#660](#660) + Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812) Fixes: + Sorting fix for the Jacobi preconditioner. [#659](#659) + Also log the first residual norm in CGS [#735](#735) + Fix BiCG and HIP CSR to work with complex matrices. [#651](#651) + Fix Coo SpMV on strided vectors. [#807](#807) + Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769) + Fix device_reset issue by moving counter/mutex to device. [#810](#810) + Fix `EnableLogging` superclass. [#841](#841) + Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726) + Decreased test size for a few device tests. [#742](#742) + Fix multiple issues with our CMake HIP and RPATH setup. [#712](#712), [#745](#745), [#709](#709) + Cleanup our CMake installation step. [#713](#713) + Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785) + Simplify third-party integration. [#786](#786) + Improve Ginkgo device arch flags management. [#696](#696) + Other fixes and improvements to the CMake setup. [#685](#685), [#792](#792), [#705](#705), [#836](#836) + Clarification of dense norm documentation [#784](#784) + Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840) + Make multiple operators/constructors explicit. [#650](#650), [#761](#761) + Fix some issues, memory leaks and warnings found by MSVC. [#666](#666), [#731](#731) + Improved solver memory estimates and consistent iteration counts [#691](#691) + Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754) + Fix for ForwardIterator requirements in iterator_factory. [#665](#665) + Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722) + Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852) Related PR: #866

upsj added is:experimental This is an experimental feature/PR/issue/module. mod:core This is related to the core module. labels Dec 17, 2020

upsj self-assigned this Dec 17, 2020

upsj force-pushed the mixed_precision_spmv branch from eb22fb1 to aaaef0e Compare March 8, 2021 10:00

upsj requested review from fritzgoebel, yhmtsai, Slaedr, pratikvn, tcojean and thoasm March 8, 2021 10:03

upsj force-pushed the mixed_precision_spmv branch from aaaef0e to 5458c5d Compare March 8, 2021 10:06

upsj mentioned this pull request Mar 8, 2021

Mixed precision ELL #717

Merged

1 task

upsj force-pushed the mixed_precision_spmv branch from 5458c5d to 4032d17 Compare March 8, 2021 10:15

tcojean reviewed Mar 8, 2021

View reviewed changes

fritzgoebel reviewed Mar 12, 2021

View reviewed changes

thoasm mentioned this pull request Mar 26, 2021

Accessor improvements #727

Merged

4 tasks

upsj force-pushed the mixed_precision_spmv branch 2 times, most recently from b601292 to 13f9ce0 Compare March 28, 2021 22:05

tcojean approved these changes Mar 30, 2021

View reviewed changes

reference/test/matrix/coo_kernels.cpp Outdated Show resolved Hide resolved

reference/test/matrix/coo_kernels.cpp Outdated Show resolved Hide resolved

upsj force-pushed the mixed_precision_spmv branch from 13f9ce0 to 2d53b4e Compare March 30, 2021 11:21

upsj and others added 3 commits March 30, 2021 13:38

conversion-based mixed precision SpMV & BLAS

d9f697f

add mixed precision tests

854fe69

fix cbgmres for complex types

b543b51

* add missing conjugation to CBGMRES * work around complex accessor issues Co-authored-by: Thomas Grützmacher <[email protected]>

upsj force-pushed the mixed_precision_spmv branch from 2d53b4e to b543b51 Compare March 30, 2021 11:41

tcojean approved these changes Mar 30, 2021

View reviewed changes

core/solver/cb_gmres.cpp Outdated Show resolved Hide resolved

fritzgoebel approved these changes Mar 30, 2021

View reviewed changes

use simple apply_impl in solver advanced apply

e82a83b

Co-authored-by: Terry Cojean <[email protected]>

upsj added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Mar 30, 2021

Slaedr suggested changes Mar 30, 2021

View reviewed changes

Slaedr approved these changes Mar 30, 2021

View reviewed changes

cuda/test/matrix/dense_kernels.cpp Outdated Show resolved Hide resolved

hip/test/matrix/dense_kernels.hip.cpp Outdated Show resolved Hide resolved

omp/test/matrix/dense_kernels.cpp Outdated Show resolved Hide resolved

review updates

534a02a

* fix missing documentation * fix mixed precision tests * add tests for device temporary conversion * add tests for make_temporary_conversion behavior Co-authored-by: Aditya Kashi <[email protected]>

upsj force-pushed the mixed_precision_spmv branch from 023fbd8 to 534a02a Compare March 31, 2021 08:01

upsj merged commit 40de7dc into develop Mar 31, 2021

upsj deleted the mixed_precision_spmv branch March 31, 2021 17:52

upsj mentioned this pull request Apr 20, 2021

Mixed precision support #247

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Mixed precision SpMV and BLAS operations #677

Support Mixed precision SpMV and BLAS operations #677

upsj commented Dec 17, 2020 •

edited

Loading

tcojean left a comment •

edited

Loading

tcojean Mar 8, 2021

tcojean Mar 8, 2021

yhmtsai Mar 9, 2021 •

edited

Loading

upsj Mar 9, 2021 •

edited

Loading

fritzgoebel Mar 12, 2021

upsj Mar 17, 2021

tcojean Mar 18, 2021

upsj Mar 18, 2021

codecov bot commented Mar 8, 2021 •

edited

Loading

fritzgoebel Mar 12, 2021

upsj commented Mar 20, 2021

tcojean commented Mar 24, 2021 •

edited

Loading

pratikvn commented Mar 29, 2021 •

edited

Loading

upsj commented Mar 29, 2021 •

edited

Loading

tcojean left a comment

tcojean left a comment

fritzgoebel left a comment

Slaedr left a comment

upsj commented Mar 30, 2021

Slaedr left a comment

upsj commented Mar 31, 2021

sonarcloud bot commented Mar 31, 2021

Support Mixed precision SpMV and BLAS operations #677

Support Mixed precision SpMV and BLAS operations #677

Conversation

upsj commented Dec 17, 2020 • edited Loading

tcojean left a comment • edited Loading

Choose a reason for hiding this comment

tcojean Mar 8, 2021

Choose a reason for hiding this comment

tcojean Mar 8, 2021

Choose a reason for hiding this comment

yhmtsai Mar 9, 2021 • edited Loading

Choose a reason for hiding this comment

upsj Mar 9, 2021 • edited Loading

Choose a reason for hiding this comment

fritzgoebel Mar 12, 2021

Choose a reason for hiding this comment

upsj Mar 17, 2021

Choose a reason for hiding this comment

tcojean Mar 18, 2021

Choose a reason for hiding this comment

upsj Mar 18, 2021

Choose a reason for hiding this comment

codecov bot commented Mar 8, 2021 • edited Loading

Codecov Report

fritzgoebel Mar 12, 2021

Choose a reason for hiding this comment

upsj commented Mar 20, 2021

tcojean commented Mar 24, 2021 • edited Loading

pratikvn commented Mar 29, 2021 • edited Loading

upsj commented Mar 29, 2021 • edited Loading

tcojean left a comment

Choose a reason for hiding this comment

tcojean left a comment

Choose a reason for hiding this comment

fritzgoebel left a comment

Choose a reason for hiding this comment

Slaedr left a comment

Choose a reason for hiding this comment

upsj commented Mar 30, 2021

Slaedr left a comment

Choose a reason for hiding this comment

upsj commented Mar 31, 2021

sonarcloud bot commented Mar 31, 2021

upsj commented Dec 17, 2020 •

edited

Loading

tcojean left a comment •

edited

Loading

yhmtsai Mar 9, 2021 •

edited

Loading

upsj Mar 9, 2021 •

edited

Loading

codecov bot commented Mar 8, 2021 •

edited

Loading

tcojean commented Mar 24, 2021 •

edited

Loading

pratikvn commented Mar 29, 2021 •

edited

Loading

upsj commented Mar 29, 2021 •

edited

Loading