-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Compressed Basis GMRES (CB-GMRES) #693
Commits on Feb 19, 2021
-
First commit for the GmresMixed class:
* Only core and reference executors. * test files don't compile, due to t problem related the macros of gtest (TEST_F -> TYPED_TEST). * MGS, MGS with reorthogonalization and CGS with reorthogonalization are considered. * Norms are still created in the internal routines.
Configuration menu - View commit details
-
Copy full SHA for cd1ffc9 - Browse repository at this point
Copy the full SHA cd1ffc9View commit details -
Inclusion of the omp executor in the repository:
* Now, the norms are properly created in the main class. * The test files are not repaired yet.
Configuration menu - View commit details
-
Copy full SHA for d7ba04c - Browse repository at this point
Copy the full SHA d7ba04cView commit details -
Inclusion of the cuda executor in the repository:
* For CGS, a loop of kernels is used instead of a kernel with a loop. * The test files are not repaired yet.
Configuration menu - View commit details
-
Copy full SHA for 969358f - Browse repository at this point
Copy the full SHA 969358fView commit details -
The test files are finally included, but:
* The messages have to be removed. * For CGS, a loop of kernels is used instead of a kernel with a loop.
Configuration menu - View commit details
-
Copy full SHA for 2b7122f - Browse repository at this point
Copy the full SHA 2b7122fView commit details -
The first unoptimized version is done. Next tasks:
* For CGS, a loop of kernels is used instead of a kernel with a loop. * Consider another base_types for ValueTypeKrylovBases
Configuration menu - View commit details
-
Copy full SHA for 6dc6470 - Browse repository at this point
Copy the full SHA 6dc6470View commit details -
The default value for ValueTypeKrylovBasis has been changed to avoid …
…some errors which were detected during the testing process in the repository. The previous value was float whereas the original results were executed by default_precision, and these are the reason of the errors. Now, the default value is also default_precision.
Configuration menu - View commit details
-
Copy full SHA for 4d4fab1 - Browse repository at this point
Copy the full SHA 4d4fab1View commit details -
Definition of the CG2 variant of the finish_arnoldi routine for omp a…
…nd cuda executors, as a first step in the optimization process. Also the calls for the timing are included.
Configuration menu - View commit details
-
Copy full SHA for e0d39a6 - Browse repository at this point
Copy the full SHA e0d39a6View commit details -
Thomas Grützmacher committed
Feb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 27019da - Browse repository at this point
Copy the full SHA 27019daView commit details -
Definition of the CGS2 version of finish_arnoldi method, for omp and …
…cuda executors. For omp, the omp is trying to move to the outer loop For cuda, the loop of kernels is change to a kernels with a loop. * The main routines (loop of dots and loop of axpy) are still too expensive.
Configuration menu - View commit details
-
Copy full SHA for f2b5a3a - Browse repository at this point
Copy the full SHA f2b5a3aView commit details -
Finally a good implementation of the multidot_kernels_num_iters_1 is …
…done. Also timing instructions are included, whose management is made by some define's. The next step will be to improve the update kernels.
Configuration menu - View commit details
-
Copy full SHA for bfd4196 - Browse repository at this point
Copy the full SHA bfd4196View commit details -
Add Accessor support and extend reference test
Added an accessor header file (name might have to change in future) and used it in all mixed precision kernels (but for now only for the reduced precision accesses). Also adds some minor fixes: - removed unused code in the example in hopes that it compiles on windows - added HIP stubs to allow HIP compilation
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 4ad491e - Browse repository at this point
Copy the full SHA 4ad491eView commit details -
Made GmresMixed compile with complex types
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for fbcc4f2 - Browse repository at this point
Copy the full SHA fbcc4f2View commit details -
The update routines have been improved. Now the computational time is…
… close to 75s for 6221 iters. Next steps should be: * Add the computation of the inf-norm for the next_krylov_basis. * Merge updating and norms computations.
Configuration menu - View commit details
-
Copy full SHA for 911fd5c - Browse repository at this point
Copy the full SHA 911fd5cView commit details -
Add specialization for integer types for Accessor
The specialization is currently set to only work with float storage type to test the pipeline, but it can easily be modified to work with all integer types. The Accessor was also moved from a shared header to a gmres_mixed exclusive header.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for db081ac - Browse repository at this point
Copy the full SHA db081acView commit details -
Make the scale work with integer types
Also did some name changes.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 7db4796 - Browse repository at this point
Copy the full SHA 7db4796View commit details -
Add helper to determine if we need a scale or not
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 9bbcf87 - Browse repository at this point
Copy the full SHA 9bbcf87View commit details -
Add a helper structure to manage the scale writing in common
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 97689ab - Browse repository at this point
Copy the full SHA 97689abView commit details -
Configuration menu - View commit details
-
Copy full SHA for c0aa2ed - Browse repository at this point
Copy the full SHA c0aa2edView commit details -
Definition of norm2 and norminf routines in CUDA. Only the first one …
…is used because the second one needs the correct management of norms as noncomplex values. The next step will be to use remove_complex for norms, and then, to use norminf.
Configuration menu - View commit details
-
Copy full SHA for da8a56e - Browse repository at this point
Copy the full SHA da8a56eView commit details -
remove_complex has been added to the norms variables, and multinorm2_…
…inf_kernel has been created. Also, the last 2-norm computation is uses as the hessenberg component, avoiding the use of update_hessenberg_2_kernel. Next steps: * To solve the problems which have been found when the set_scale method is used. * To know why the reference tests don't work properly.
Configuration menu - View commit details
-
Copy full SHA for 23efe7a - Browse repository at this point
Copy the full SHA 23efe7aView commit details -
Fixed cuda step2 to take a view into account
Previously, the full krylov_bases were always used, regardless if a view was used or not. Now, the dimensions for krylov_bases are derived from hessenberg to take a potential view into account.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for f3cdbd7 - Browse repository at this point
Copy the full SHA f3cdbd7View commit details -
Change const accessor to non-const in check_arnoldi_norms_new
This allows it to compile with the writing of the scale.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for d4865f2 - Browse repository at this point
Copy the full SHA d4865f2View commit details -
The set_scale method finally works!!
I have also tested the <double, long int> case and it also works.
Configuration menu - View commit details
-
Copy full SHA for 42a779b - Browse repository at this point
Copy the full SHA 42a779bView commit details -
Change storage layout of krylov_bases
Previously, krylov bases b1, b2, b3 were stored [b1[0], b2[0], b3[0],b1[1], b2[1], b3[1], ...]. Now, they are stored in transposed fashion: [b1; b2; b3], which is equivalent to [b1[0], b1[1], ..., b2[0], b2[1],..., b3[0], b3[1], ...] Note that if multiple right hand sides are present, the vectors are still stored in row major. To make that happen, the accessor is now 3D. They store the follwing for (x, y, z): x: represent which krylov vector you want to address y: which row-element you want to use and z: which column-element, aka which right hand side you want to use. The accessor is stored row-major.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 6b25d2e - Browse repository at this point
Copy the full SHA 6b25d2eView commit details -
Make memory access to krylov_bases coalesced again
This is only done for finish_arnoldi_CGS2 by changing the kernels to adapt to the new storage format of krylov_bases.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for e3ca7cc - Browse repository at this point
Copy the full SHA e3ca7ccView commit details -
Transpose grid when launching singledot kernel
Transposing is supposed to avoid write-conflicts during the atomic_add at the end. Obviously, the kernel was changed accordingly, so the behavior has not changed. Also contains: - Scaling is now only added for integer types (not for float anymore) - Add benchmarks for GmresMixed integer - Some cleanup for debug output in CUDA
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 911fccb - Browse repository at this point
Copy the full SHA 911fccbView commit details -
Reversed the transpose of the grid dim
The transposing needs further investigation, but a single test on a V100 lead to worse results than before, so I am undoing this change for now.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 5dd8bf2 - Browse repository at this point
Copy the full SHA 5dd8bf2View commit details -
Add half precision support to GmresMixed
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for b214a5f - Browse repository at this point
Copy the full SHA b214a5fView commit details -
Hopefully improve singledot performance
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for f1f178e - Browse repository at this point
Copy the full SHA f1f178eView commit details -
Infinity norm only computed when scale is present
Additionally, removed the debug output and removed unused kernel parameter.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for fb64f5c - Browse repository at this point
Copy the full SHA fb64f5cView commit details -
Add another RHS generation in the benchmark
- The benchmark script now has the option to change the initial guess - Add option to generate the RHS with: b = A * (s / |s|) with s(i) = sin(i)
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 155c552 - Browse repository at this point
Copy the full SHA 155c552View commit details -
Fix residual_norm calculation in GmresMixed
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for d660c47 - Browse repository at this point
Copy the full SHA d660c47View commit details -
Make sure GmresMixed does not exit early
If the stopping criterion is met, perform a reset of GMRES and check the residual again. If it is still correct, exit, otherwise, keep calculating. Other changes: - Renamed krylov_dim_mixed to krylov_dim in GmresMixed (to be consistent with Gmres)
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 8ede6d6 - Browse repository at this point
Copy the full SHA 8ede6d6View commit details -
Add benchmark parameter for GMRES krylov_dim
Adds the parameter in the benchmark script (and documentation).
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for af6fb48 - Browse repository at this point
Copy the full SHA af6fb48View commit details -
Add forced iterations when convergence is detected
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 49f1763 - Browse repository at this point
Copy the full SHA 49f1763View commit details -
Add debug output to forced iterations
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 188ff18 - Browse repository at this point
Copy the full SHA 188ff18View commit details -
Fix reference bug in GmresMixed
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 297004b - Browse repository at this point
Copy the full SHA 297004bView commit details -
DEBUG: Add write output for integral accessor
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for b7765c3 - Browse repository at this point
Copy the full SHA b7765c3View commit details -
DEBUG: Move towards
at
with accessorThomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for f31e87d - Browse repository at this point
Copy the full SHA f31e87dView commit details -
Reference executor now compiles, with the donwside that an Accessor can no longer be created from const pointers.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 6f82f47 - Browse repository at this point
Copy the full SHA 6f82f47View commit details -
Adopt OpenMP support to new Accessor
So it compiles and works, it does not yet use `at` or similar.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for c385a2b - Browse repository at this point
Copy the full SHA c385a2bView commit details -
Remove unused GMRES_mixed code from Ref & OMP
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 9a0a0f0 - Browse repository at this point
Copy the full SHA 9a0a0f0View commit details -
Adopt CUDA to the new accessor format (NOT
at
)Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 3d44a21 - Browse repository at this point
Copy the full SHA 3d44a21View commit details -
Make HIP and CUDA work with new accessor (NOT at)
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for e1654b4 - Browse repository at this point
Copy the full SHA e1654b4View commit details -
Thomas Grützmacher committed
Feb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 5d09f2a - Browse repository at this point
Copy the full SHA 5d09f2aView commit details -
CUDA implementation is now using
at
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 48e0899 - Browse repository at this point
Copy the full SHA 48e0899View commit details -
Also add instantiation macro for ConstAccessors
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for b542646 - Browse repository at this point
Copy the full SHA b542646View commit details -
Fix accessor by adding additional __restrict__
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for b0a2ac3 - Browse repository at this point
Copy the full SHA b0a2ac3View commit details -
GmresMixed storage prec is now a factory parameter
Currently, only core is adapted with the reference test started (not all precision combinations are tested properly).
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 5d1f173 - Browse repository at this point
Copy the full SHA 5d1f173View commit details -
Improve reference test and include the enum there
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 267bbf1 - Browse repository at this point
Copy the full SHA 267bbf1View commit details -
Fix the reference test to pass
Also CUDA and OpenMP compiles now for the new accessor layout. Benchmarks is still TODO.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 6b92a4f - Browse repository at this point
Copy the full SHA 6b92a4fView commit details -
Thomas Grützmacher committed
Feb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 1d4a773 - Browse repository at this point
Copy the full SHA 1d4a773View commit details -
Update the helper to throw when complex
Also add instantiation for single precision floating point
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 30381e2 - Browse repository at this point
Copy the full SHA 30381e2View commit details -
Make GmresMixed work properly with multiple RHS
Also adjust test precision to be more accurate.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 6d30e36 - Browse repository at this point
Copy the full SHA 6d30e36View commit details -
Fix benchmark to work with new GmresMixed layout
Make GmresMixed reference test work on CI.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 7f86d9c - Browse repository at this point
Copy the full SHA 7f86d9cView commit details -
Use new reduced_row_major Accessor in GmresMixed
Of course also use scaled_reduced_row_major. Both are from the newly-added Accessors from the private headers.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for b2b9ebc - Browse repository at this point
Copy the full SHA b2b9ebcView commit details -
Remove unnecessary code from CUDA GmresMixed
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for db3046f - Browse repository at this point
Copy the full SHA db3046fView commit details -
Thomas Grützmacher committed
Feb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for da87d06 - Browse repository at this point
Copy the full SHA da87d06View commit details -
Thomas Grützmacher committed
Feb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 65a8a8b - Browse repository at this point
Copy the full SHA 65a8a8bView commit details -
Thomas Grützmacher committed
Feb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for a9646e8 - Browse repository at this point
Copy the full SHA a9646e8View commit details -
Thomas Grützmacher committed
Feb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for c6020db - Browse repository at this point
Copy the full SHA c6020dbView commit details -
Co-authored-by: Thomas Grützmacher <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 36a1a19 - Browse repository at this point
Copy the full SHA 36a1a19View commit details -
Add DPCPP stubs to allow compilation
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for c4a7335 - Browse repository at this point
Copy the full SHA c4a7335View commit details -
Make cb-gmres benchmarks dependent on etype
TODO: Rename cb-gmres benchmark strings
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for fa1fc39 - Browse repository at this point
Copy the full SHA fa1fc39View commit details -
Fix implementation and reference test for CB-GMRES
Having forced iteration for small matrices can lead to NaN values during CB-GMRES. To prevent that, forced iterations now only happen after the 10th total iteration. Before that, a vector is immediately declared as converged as soon as the stopping criterion said so. Also done: - Removed useless debug-output from CB-GMRES - Changed the tolerances for the reference test to always pass
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for c509263 - Browse repository at this point
Copy the full SHA c509263View commit details -
Update tolerance for one reference CB-GMRES test
This is necessary for the Intel compiler to pass the test `SolvesStencilSystem2`.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 6bb09da - Browse repository at this point
Copy the full SHA 6bb09daView commit details -
- Remove unnecessary code - Add wrapper function to `atomic_helper` in order make it easier to implement other atomic operations (atomic_add and atomic_max use this wrapper)
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 3655021 - Browse repository at this point
Copy the full SHA 3655021View commit details -
Remove unnecessary kernels and properly name them
in CB-GMRES Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Yuhsiang M. Tsai <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5270478 - Browse repository at this point
Copy the full SHA 5270478View commit details -
- Add arnoldi_norm documentation and add it in the tests (was not part of it previously, and needs a fix) - Add documentation to benchmarking - Rename namespace and helper for range helper Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Yuhsiang M. Tsai <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2f0a32e - Browse repository at this point
Copy the full SHA 2f0a32eView commit details -
Add Helper INSTANTIATE macro for CB-GMRES
Co-authored-by: Terry Cojean <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 58e1104 - Browse repository at this point
Copy the full SHA 58e1104View commit details -
Remove CB-GMRES and GMRES example
Both examples are removed because the functionality is so similar to simple-solver, so it does not add a lot of value.
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for c47a4ca - Browse repository at this point
Copy the full SHA c47a4caView commit details -
- Add more documentation to CB-GMRES - Add CB-GMRES to test-install Co-authored-by: Terry Cojean <[email protected]> Co-authored-by: Yuhsiang M. Tsai <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2b587ba - Browse repository at this point
Copy the full SHA 2b587baView commit details -
Remove unnecessary includes of iostream and time.h
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for da8a6b4 - Browse repository at this point
Copy the full SHA da8a6b4View commit details -
Remove circular dependency of compute_norm2 in (CB)-GMRES
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for c4c1270 - Browse repository at this point
Copy the full SHA c4c1270View commit details -
Update solver generation in benchmark
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 00d33b5 - Browse repository at this point
Copy the full SHA 00d33b5View commit details -
Update eta and arnoldi_norms in CB-GMRES
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for b163893 - Browse repository at this point
Copy the full SHA b163893View commit details -
Remove CUDA 9.0 exception for constexpr parameter
`--expt-relaxed-constexpr` is now used for every CUDA version
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 342957a - Browse repository at this point
Copy the full SHA 342957aView commit details -
Co-authored-by: Pratik Nayak <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 370d208 - Browse repository at this point
Copy the full SHA 370d208View commit details -
The following modifications were done for CB-GMRES: - Remove unused kernel parameters `num_reorth_steps` and `num_reorth_vectors` - Remove unused `b_norm` - Make unused kernel parameters unnamed - Add some explicit casts to prevent warning
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 1b2071d - Browse repository at this point
Copy the full SHA 1b2071dView commit details -
Review update; Improve run_all_benchmarks.sh
- Update documentation of GMRES to mention the usage of MGS - Use reduced precision in CB-GMRES by default
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for b4a6fc9 - Browse repository at this point
Copy the full SHA b4a6fc9View commit details -
Put storage_precision enum into cb_gmres namespace
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 599e261 - Browse repository at this point
Copy the full SHA 599e261View commit details -
Thomas Grützmacher committed
Feb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 5126858 - Browse repository at this point
Copy the full SHA 5126858View commit details -
Remove unnecessary included files for CB-GMRES
Thomas Grützmacher committedFeb 19, 2021 Configuration menu - View commit details
-
Copy full SHA for 169040d - Browse repository at this point
Copy the full SHA 169040dView commit details -
Co-authored-by: Pratik Nayak <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 05374fd - Browse repository at this point
Copy the full SHA 05374fdView commit details -
- Extract GPU kernels for CB-GMRES and GMRES into a new file to avoid duplication. - Adopt the updated GMRES functionality for these kernels for CPU and GPU Co-authored-by: Pratik Nayak <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 389d038 - Browse repository at this point
Copy the full SHA 389d038View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6d6bbab - Browse repository at this point
Copy the full SHA 6d6bbabView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4d722f1 - Browse repository at this point
Copy the full SHA 4d722f1View commit details