Tags: andoorve/cutlass
Tags
Update CMakeLists.txt (NVIDIA#473) * Update CMakeLists.txt Add 128bit int support if using nvc++ to solve NVIDIA#310 @jeffhammond, would you please give it a try? * Update CMakeLists.txt correct copy paste error
Updated GEMM performance plot with CUTLASS 2.8 compiled with CUDA 11.… …5 Toolkit (NVIDIA#375) Updated GEMM performance plot with CUTLASS 2.8 compiled using CUDA 11.5 Toolkit. GPUs under test: NVIDIA A100 NVIDIA A2 NVIDIA TitanV NVIDIA GeForce 2080 Ti
CUTLASS 2.7 (NVIDIA#318) CUTLASS 2.7 Mainloop fusion for GEMM: summation over A or B Strided DGRAD (optimized iterators) Half-precision GELU_taylor activation functions Use these when accumulation and epilogue compute types are all cutlass::half_t Tuning and bug fixes to fused GEMM + GEMM example Support for smaller than 128b aligned Convolutions: see examples Caching of results to accelerate Convolution unit tests Can be enabled or disabled by running cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF Corrections and bug fixes reported by the CUTLASS community Thank you for filing these issues! authored-by: Haicheng Wu [email protected], Manish Gupta [email protected], Dustyn Blasig [email protected], Andrew Kerr [email protected]
CUTLASS 2.6.1 - functional and performance enhancements to strided DG… …RAD, fixes, and tuning * cutlass 2.6 update * remove debug prints * cutlass 2.6.1 (minor update) * Updated CHANGELOG. * Minor edit to readme to indicate patch version. * Minor edit to readme. Co-authored-by: Haicheng Wu <[email protected]>, Andrew Kerr <[email protected]>
Merge pull request NVIDIA#308 from dongxiao92/patch-1 fix typo in doc
Merge pull request NVIDIA#135 from NVIDIA/cutlass_2.3_final CUTLASS 2.3.0
Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (N… …VIDIA#100) - Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. - Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out - Added test_examples target to build and test all CUTLASS examples - Minor edits to documentation to point to GTC 2020 webinar
PreviousNext