Improve Dense stride handling #774

upsj · 2021-05-25T07:24:22Z

This PR attempts to make the handling of Dense strides/padding in Ginkgo more consistent:

Make plain copy_from/convert_to keep original output stride
Make conversion copy_from/convert_to keep original output stride
~~Remove padding in solver create_with_config_of~~ This will be handled in Add common interface for simple kernels #733
Add out-parameter versions with correct stride handling of
- *permute
- *transpose
Add tests for non-default strides + cross-executor output parameters

codecov · 2021-05-29T00:24:47Z

Codecov Report

Merging #774 (4677856) into develop (da19a97) will decrease coverage by 1.12%.
The diff coverage is 98.87%.

@@             Coverage Diff             @@
##           develop     #774      +/-   ##
===========================================
- Coverage    94.17%   93.04%   -1.13%     
===========================================
  Files          400      400              
  Lines        31080    31603     +523     
===========================================
+ Hits         29270    29406     +136     
- Misses        1810     2197     +387

Impacted Files	Coverage Δ
core/device_hooks/common_kernels.inc.cpp	`0.00% <0.00%> (ø)`
core/test/matrix/identity.cpp	`100.00% <ø> (ø)`
include/ginkgo/core/base/types.hpp	`92.59% <ø> (ø)`
include/ginkgo/core/matrix/dense.hpp	`95.12% <62.50%> (-1.49%)`	⬇️
include/ginkgo/core/base/array.hpp	`94.06% <75.00%> (+4.50%)`	⬆️
reference/test/matrix/dense_kernels.cpp	`99.79% <99.49%> (-0.21%)`	⬇️
core/matrix/dense.cpp	`99.51% <100.00%> (+0.08%)`	⬆️
core/test/base/array.cpp	`100.00% <100.00%> (ø)`
core/test/base/utils.cpp	`95.71% <100.00%> (+0.25%)`	⬆️
include/ginkgo/core/base/temporary_clone.hpp	`100.00% <100.00%> (ø)`
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update da19a97...4677856. Read the comment docs.

greole · 2021-06-01T10:47:38Z

README.md

@@ -40,8 +40,8 @@ For Ginkgo core library:
 * C++14 compliant compiler, one of:
 * _gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+_
 * _clang 3.9+_
- * _Intel compiler 2017+_
- * _Apple LLVM 8.0+_ (__TODO__: verify)


Only partially relevant to this commit, but when updating versions one could mention clang-format < 11.

Good point, we only mention a lower version for clang-format. Maybe we can manage to fix that with a slightly modified format, though.

pratikvn

LGTM! Only some minor things. One thing is the new void functions operating on Dense objects. I think we discussed this some time back, that the interfaces returning LinOp's are not satisfactory, but we should update them to return the concrete type instead. But as it is interface breaking change, we advanced it to 2.0.

The void functions are a solution to that, but you add a significant amount of additional code, so maybe we can remove those for now and fix the interfaces for 2.0 ?

core/matrix/dense.cpp

cuda/test/matrix/dense_kernels.cpp

include/ginkgo/core/base/types.hpp

pratikvn · 2021-06-01T11:11:22Z

include/ginkgo/core/matrix/dense.hpp

+ * @param output The output matrix. It must have the dimensions
+ * `gko::transpose(this->get_size())`
+ */
+ void transpose(Dense *output) const;


Do these functions have to be public ? Maybe if we are using it in only internally, they can be protected ?

reference/test/matrix/dense_kernels.cpp

Slaedr

I have not seen most of the code yet, but a basic question first: why do we need these cross-executor output-parameter versions of transpose etc? Do we need to transpose from one executor to another often? If not, it's just one or two more lines to do it wherever needed.

cuda/test/matrix/dense_kernels.cpp

include/ginkgo/core/matrix/dense.hpp

upsj · 2021-06-01T13:11:25Z

@Slaedr @pratikvn I guess I can answer your questions/comments in the same stroke: We currently have no way to call any of these functions (transpose, permute, ...) to write data into an output vector without any allocations, which I would like to change. Especially for repeated operations the alternative dense->permute(...)->convert_to(...) is much less handy and incurs additional overhead. The whole cross-executor execution is only an additional make_temporary_clone, which I hope doesn't hurt considering it prevents any kinds of segfaults due to mismatching executors, and eliminates the need for additional temporaries and convert_to when you want to move data between CPU and GPU in the same operation.

yhmtsai

LGTM.

core/matrix/dense.cpp

Co-authored-by: Gregor Olenik <[email protected]> Co-authored-by: Pratik Nayak <[email protected]>

use a temporary clone of the array instead of working around recursion in convert_to

This provides an alternative to make_temporary_clone that doesn't initialize the content for output-parameters

upsj · 2021-06-01T23:05:01Z

@Slaedr @pratikvn @yhmtsai The last two commits might warrant another pass :)

include/ginkgo/core/base/temporary_clone.hpp

pratikvn · 2021-06-02T10:11:40Z

I think the idea of make_temporary_output_clone is nice.

I guess one of the things that we prefer is robustness over performance. I think I remember discussing this sometime before that we should automatically copy data between non-matching executors, and the user needs to be careful about creating their objects on the correct executors. I guess this is something we differ from other libraries such as Kokkos (I think).

While I definitely agree with this philosophy, I think we should think of some way to let the user know that the cross-executor copies are occurring. I guess currently one way is through the logger, by logging the data transfers, which is definitely very detailed, but is also slightly more involved. Would it be possible to add some non program terminating asserts or warning logs of some kind to ease this ?

upsj · 2021-06-02T10:21:22Z

What would you think about adding a new logger event which specifically logs temporary clones? Then we could at a later point provide a performance_hint_logger that spits out these kinds of messages?

pratikvn · 2021-06-02T10:25:10Z

What would you think about adding a new logger event which specifically logs temporary clones? Then we could at a later point provide a performance_hint_logger that spits out these kinds of messages?

Yes, that is a good idea. I was thinking of something like that as well. But I guess it might be better to do it in a separate PR.

core/matrix/dense.cpp

core/test/base/array.cpp

include/ginkgo/core/base/array.hpp

reference/test/matrix/dense_kernels.cpp

yhmtsai · 2021-06-02T13:28:04Z

If I understand it correctly, it create the storage from the data with the given executor but does not copy the data, right?
how about to use make_temporary_storage(_clone) for make_temporary_output_clone?

upsj · 2021-06-02T13:33:07Z

@yhmtsai That is a good suggestion! Honestly, I still like output_clone slightly more, since to me temporary storage is something that is discarded afterwards, while an output clone somewhat implies that the data will be used (= copied back) afterwards.

This fixes test failures in core tests with Reference disabled

thoasm

LGTM!
Really nice detailed tests you added, good job!

core/matrix/dense.cpp

include/ginkgo/core/base/types.hpp

Co-authored-by: Thomas Grützmacher <[email protected]>

sonarcloud · 2021-06-03T14:08:21Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
2 Code Smells

95.0% Coverage
1.7% Duplication

Ginkgo release 1.4.0 The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem which enables Intel-GPU and CPU execution. The only Ginkgo features which have not been ported yet are some preconditioners. Ginkgo's mixed-precision support is greatly enhanced thanks to: 1. The new Accessor concept, which allows writing kernels featuring on-the-fly memory compression, among other features. The accessor can be used as header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example. 2. All LinOps now transparently support mixed-precision execution. By default, this is done through a temporary copy which may have a performance impact but already allows mixed-precision research. Native mixed-precision ELL kernels are implemented which do not see this cost. The accessor is also leveraged in a new CB-GMRES solver which allows for performance improvements by compressing the Krylov basis vectors. Many other features have been added to Ginkgo, such as reordering support, a new IDR solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU for now), machine topology information, and more! Supported systems and requirements: + For all platforms, cmake 3.13+ + C++14 compliant compiler + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 3.5+ + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add a new DPC++ Executor for SYCL execution and other base utilities [#648](#648), [#661](#661), [#757](#757), [#832](#832) + Port matrix formats, solvers and related kernels to DPC++. For some kernels, also make use of a shared kernel implementation for all executors (except Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856) + Add accessors which allow multi-precision kernels, among other things. [#643](#643), [#708](#708) + Add support for mixed precision operations through apply in all LinOps. [#677](#677) + Add incomplete Cholesky factorizations and preconditioners as well as some improvements to ILU. [#672](#672), [#837](#837), [#846](#846) + Add an AMGX implementation and kernels on all devices but DPC++. [#528](#528), [#695](#695), [#860](#860) + Add a new mixed-precision capability solver, Compressed Basis GMRES (CB-GMRES). [#693](#693), [#763](#763) + Add the IDR(s) solver. [#620](#620) + Add a new fixed-size block CSR matrix format (for the Reference executor). [#671](#671), [#730](#730) + Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780) + Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649) + Add matrix assembly support on CPUs. [#644](#644) + Extends ISAI from triangular to general and spd matrices. [#690](#690) Other additions: + Add the possibility to apply real matrices to complex vectors. [#655](#655), [#658](#658) + Add functions to compute the absolute of a matrix format. [#636](#636) + Add symmetric permutation and improve existing permutations. [#684](#684), [#657](#657), [#663](#663) + Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697) + Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850) + Row-major accessor is generalized to more than 2 dimensions and a new "block column-major" accessor has been added. [#707](#707) + Add an heat equation example. [#698](#698), [#706](#706) + Add ccache support in CMake and CI. [#725](#725), [#739](#739) + Allow tuning and benchmarking variables non intrusively. [#692](#692) + Add triangular solver benchmark [#664](#664) + Add benchmarks for BLAS operations [#772](#772), [#829](#829) + Add support for different precisions and consistent index types in benchmarks. [#675](#675), [#828](#828) + Add a Github bot system to facilitate development and PR management. [#667](#667), [#674](#674), [#689](#689), [#853](#853) + Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781) + Add ssh debugging for Github Actions CI. [#749](#749) + Add pipeline segmentation for better CI speed. [#737](#737) Changes: + Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854) + Add implicit residual log for solvers and benchmarks. [#714](#714) + Change handling of the conjugate in the dense dot product. [#755](#755) + Improved Dense stride handling. [#774](#774) + Multiple improvements to the OpenMP kernels performance, including COO, an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740) + Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718) + Improved Identity constructor and treatment of rectangular matrices. [#646](#646) + Allow CUDA/HIP executors to select allocation mode. [#758](#758) + Check if executors share the same memory. [#670](#670) + Improve test install and smoke testing support. [#721](#721) + Update the JOSS paper citation and add publications in the documentation. [#629](#629), [#724](#724) + Improve the version output. [#806](#806) + Add some utilities for dim and span. [#821](#821) + Improved solver and preconditioner benchmarks. [#660](#660) + Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812) Fixes: + Sorting fix for the Jacobi preconditioner. [#659](#659) + Also log the first residual norm in CGS [#735](#735) + Fix BiCG and HIP CSR to work with complex matrices. [#651](#651) + Fix Coo SpMV on strided vectors. [#807](#807) + Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769) + Fix device_reset issue by moving counter/mutex to device. [#810](#810) + Fix `EnableLogging` superclass. [#841](#841) + Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726) + Decreased test size for a few device tests. [#742](#742) + Fix multiple issues with our CMake HIP and RPATH setup. [#712](#712), [#745](#745), [#709](#709) + Cleanup our CMake installation step. [#713](#713) + Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785) + Simplify third-party integration. [#786](#786) + Improve Ginkgo device arch flags management. [#696](#696) + Other fixes and improvements to the CMake setup. [#685](#685), [#792](#792), [#705](#705), [#836](#836) + Clarification of dense norm documentation [#784](#784) + Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840) + Make multiple operators/constructors explicit. [#650](#650), [#761](#761) + Fix some issues, memory leaks and warnings found by MSVC. [#666](#666), [#731](#731) + Improved solver memory estimates and consistent iteration counts [#691](#691) + Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754) + Fix for ForwardIterator requirements in iterator_factory. [#665](#665) + Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722) + Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852) Related PR: #857

Release 1.4.0 to master The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem which enables Intel-GPU and CPU execution. The only Ginkgo features which have not been ported yet are some preconditioners. Ginkgo's mixed-precision support is greatly enhanced thanks to: 1. The new Accessor concept, which allows writing kernels featuring on-the-fly memory compression, among other features. The accessor can be used as header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example. 2. All LinOps now transparently support mixed-precision execution. By default, this is done through a temporary copy which may have a performance impact but already allows mixed-precision research. Native mixed-precision ELL kernels are implemented which do not see this cost. The accessor is also leveraged in a new CB-GMRES solver which allows for performance improvements by compressing the Krylov basis vectors. Many other features have been added to Ginkgo, such as reordering support, a new IDR solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU for now), machine topology information, and more! Supported systems and requirements: + For all platforms, cmake 3.13+ + C++14 compliant compiler + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 3.5+ + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add a new DPC++ Executor for SYCL execution and other base utilities [#648](#648), [#661](#661), [#757](#757), [#832](#832) + Port matrix formats, solvers and related kernels to DPC++. For some kernels, also make use of a shared kernel implementation for all executors (except Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856) + Add accessors which allow multi-precision kernels, among other things. [#643](#643), [#708](#708) + Add support for mixed precision operations through apply in all LinOps. [#677](#677) + Add incomplete Cholesky factorizations and preconditioners as well as some improvements to ILU. [#672](#672), [#837](#837), [#846](#846) + Add an AMGX implementation and kernels on all devices but DPC++. [#528](#528), [#695](#695), [#860](#860) + Add a new mixed-precision capability solver, Compressed Basis GMRES (CB-GMRES). [#693](#693), [#763](#763) + Add the IDR(s) solver. [#620](#620) + Add a new fixed-size block CSR matrix format (for the Reference executor). [#671](#671), [#730](#730) + Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780) + Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649) + Add matrix assembly support on CPUs. [#644](#644) + Extends ISAI from triangular to general and spd matrices. [#690](#690) Other additions: + Add the possibility to apply real matrices to complex vectors. [#655](#655), [#658](#658) + Add functions to compute the absolute of a matrix format. [#636](#636) + Add symmetric permutation and improve existing permutations. [#684](#684), [#657](#657), [#663](#663) + Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697) + Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850) + Row-major accessor is generalized to more than 2 dimensions and a new "block column-major" accessor has been added. [#707](#707) + Add an heat equation example. [#698](#698), [#706](#706) + Add ccache support in CMake and CI. [#725](#725), [#739](#739) + Allow tuning and benchmarking variables non intrusively. [#692](#692) + Add triangular solver benchmark [#664](#664) + Add benchmarks for BLAS operations [#772](#772), [#829](#829) + Add support for different precisions and consistent index types in benchmarks. [#675](#675), [#828](#828) + Add a Github bot system to facilitate development and PR management. [#667](#667), [#674](#674), [#689](#689), [#853](#853) + Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781) + Add ssh debugging for Github Actions CI. [#749](#749) + Add pipeline segmentation for better CI speed. [#737](#737) Changes: + Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854) + Add implicit residual log for solvers and benchmarks. [#714](#714) + Change handling of the conjugate in the dense dot product. [#755](#755) + Improved Dense stride handling. [#774](#774) + Multiple improvements to the OpenMP kernels performance, including COO, an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740) + Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718) + Improved Identity constructor and treatment of rectangular matrices. [#646](#646) + Allow CUDA/HIP executors to select allocation mode. [#758](#758) + Check if executors share the same memory. [#670](#670) + Improve test install and smoke testing support. [#721](#721) + Update the JOSS paper citation and add publications in the documentation. [#629](#629), [#724](#724) + Improve the version output. [#806](#806) + Add some utilities for dim and span. [#821](#821) + Improved solver and preconditioner benchmarks. [#660](#660) + Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812) Fixes: + Sorting fix for the Jacobi preconditioner. [#659](#659) + Also log the first residual norm in CGS [#735](#735) + Fix BiCG and HIP CSR to work with complex matrices. [#651](#651) + Fix Coo SpMV on strided vectors. [#807](#807) + Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769) + Fix device_reset issue by moving counter/mutex to device. [#810](#810) + Fix `EnableLogging` superclass. [#841](#841) + Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726) + Decreased test size for a few device tests. [#742](#742) + Fix multiple issues with our CMake HIP and RPATH setup. [#712](#712), [#745](#745), [#709](#709) + Cleanup our CMake installation step. [#713](#713) + Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785) + Simplify third-party integration. [#786](#786) + Improve Ginkgo device arch flags management. [#696](#696) + Other fixes and improvements to the CMake setup. [#685](#685), [#792](#792), [#705](#705), [#836](#836) + Clarification of dense norm documentation [#784](#784) + Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840) + Make multiple operators/constructors explicit. [#650](#650), [#761](#761) + Fix some issues, memory leaks and warnings found by MSVC. [#666](#666), [#731](#731) + Improved solver memory estimates and consistent iteration counts [#691](#691) + Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754) + Fix for ForwardIterator requirements in iterator_factory. [#665](#665) + Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722) + Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852) Related PR: #866

upsj added the 1:ST:WIP This PR is a work in progress. Not ready for review. label May 25, 2021

upsj added this to the Ginkgo 1.4.0 milestone May 25, 2021

upsj self-assigned this May 25, 2021

upsj requested review from Slaedr, yhmtsai, pratikvn, greole, hartwiganzt, tcojean and thoasm May 25, 2021 07:24

upsj linked an issue May 25, 2021 that may be closed by this pull request

Consistent treatment of strides in Dense #748

Closed

ginkgo-bot added mod:all This touches all Ginkgo modules. reg:testing This is related to testing. type:matrix-format This is related to the Matrix formats labels May 25, 2021

upsj added this to In Progress in Ginkgo development May 25, 2021

upsj mentioned this pull request May 25, 2021

Add benchmark for BLAS operations #772

Merged

upsj force-pushed the dense_strides branch from c9b8be0 to b376a1f Compare May 27, 2021 22:09

upsj added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels May 27, 2021

upsj moved this from In Progress to Awaiting Review in Ginkgo development May 28, 2021

upsj added the 1:ST:run-full-test label May 28, 2021

upsj force-pushed the dense_strides branch from 6ff52dc to 524d099 Compare May 28, 2021 15:30

greole reviewed Jun 1, 2021

View reviewed changes

pratikvn approved these changes Jun 1, 2021

View reviewed changes

Slaedr reviewed Jun 1, 2021

View reviewed changes

cuda/test/matrix/dense_kernels.cpp Show resolved Hide resolved

greole reviewed Jun 1, 2021

View reviewed changes

include/ginkgo/core/matrix/dense.hpp Outdated Show resolved Hide resolved

greole reviewed Jun 1, 2021

View reviewed changes

include/ginkgo/core/matrix/dense.hpp Outdated Show resolved Hide resolved

yhmtsai approved these changes Jun 1, 2021

View reviewed changes

core/matrix/dense.cpp Outdated Show resolved Hide resolved

upsj and others added 4 commits June 1, 2021 23:00

remove Identity kernel calls from core tests

7d04012

fix icc compilation issues

4d7ff23

remove support for Intel 2017

84b2f77

review update

cc30d8f

Co-authored-by: Gregor Olenik <[email protected]> Co-authored-by: Pratik Nayak <[email protected]>

upsj force-pushed the dense_strides branch from ec7b6a9 to ef44f36 Compare June 1, 2021 22:42

upsj added 2 commits June 2, 2021 00:45

improve dense convert_to

65228ef

use a temporary clone of the array instead of working around recursion in convert_to

add make_temporary_output_clone

f455156

This provides an alternative to make_temporary_clone that doesn't initialize the content for output-parameters

upsj force-pushed the dense_strides branch from ef44f36 to f455156 Compare June 1, 2021 22:46

fix ref dense temporary clone test without omp

040749f

pratikvn reviewed Jun 2, 2021

View reviewed changes

include/ginkgo/core/base/temporary_clone.hpp Outdated Show resolved Hide resolved

upsj added 2 commits June 2, 2021 12:36

actually avoid copies in temporary_output_clone

8b62161

unify permutation core implementations

caa8595

yhmtsai reviewed Jun 2, 2021

View reviewed changes

core/matrix/dense.cpp Show resolved Hide resolved

core/test/base/array.cpp Show resolved Hide resolved

include/ginkgo/core/base/array.hpp Show resolved Hide resolved

reference/test/matrix/dense_kernels.cpp Show resolved Hide resolved

avoid copies on empty matrices

4c847f4

This fixes test failures in core tests with Reference disabled

upsj added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review 1:ST:run-full-test labels Jun 2, 2021

thoasm approved these changes Jun 2, 2021

View reviewed changes

core/matrix/dense.cpp Outdated Show resolved Hide resolved

include/ginkgo/core/base/types.hpp Outdated Show resolved Hide resolved

fix single-precision Dense copy on dpcpp

4677856

Co-authored-by: Thomas Grützmacher <[email protected]>

upsj merged commit b621017 into develop Jun 3, 2021

upsj deleted the dense_strides branch June 3, 2021 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Dense stride handling #774

Improve Dense stride handling #774

upsj commented May 25, 2021 •

edited

Loading

codecov bot commented May 29, 2021 •

edited

Loading

greole Jun 1, 2021

upsj Jun 1, 2021

pratikvn left a comment

pratikvn Jun 1, 2021

Slaedr left a comment

upsj commented Jun 1, 2021

yhmtsai left a comment

upsj commented Jun 1, 2021

pratikvn commented Jun 2, 2021

upsj commented Jun 2, 2021

pratikvn commented Jun 2, 2021

yhmtsai commented Jun 2, 2021

upsj commented Jun 2, 2021

thoasm left a comment

sonarcloud bot commented Jun 3, 2021

Improve Dense stride handling #774

Improve Dense stride handling #774

Conversation

upsj commented May 25, 2021 • edited Loading

codecov bot commented May 29, 2021 • edited Loading

Codecov Report

greole Jun 1, 2021

Choose a reason for hiding this comment

upsj Jun 1, 2021

Choose a reason for hiding this comment

pratikvn left a comment

Choose a reason for hiding this comment

pratikvn Jun 1, 2021

Choose a reason for hiding this comment

Slaedr left a comment

Choose a reason for hiding this comment

upsj commented Jun 1, 2021

yhmtsai left a comment

Choose a reason for hiding this comment

upsj commented Jun 1, 2021

pratikvn commented Jun 2, 2021

upsj commented Jun 2, 2021

pratikvn commented Jun 2, 2021

yhmtsai commented Jun 2, 2021

upsj commented Jun 2, 2021

thoasm left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Jun 3, 2021

upsj commented May 25, 2021 •

edited

Loading

codecov bot commented May 29, 2021 •

edited

Loading