Skip to content

Tags: andoorve/cutlass

Tags

v2.9.0

Toggle v2.9.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Update CMakeLists.txt (NVIDIA#473)

* Update CMakeLists.txt

Add 128bit int support if using nvc++ to solve NVIDIA#310 

@jeffhammond, would you please give it a try?

* Update CMakeLists.txt

correct copy paste error

v2.8.0

Toggle v2.8.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Updated GEMM performance plot with CUTLASS 2.8 compiled with CUDA 11.…

…5 Toolkit (NVIDIA#375)

Updated GEMM performance plot with CUTLASS 2.8 compiled using CUDA 11.5 Toolkit.

GPUs under test:

    NVIDIA A100
    NVIDIA A2
    NVIDIA TitanV
    NVIDIA GeForce 2080 Ti

v2.7.0

Toggle v2.7.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
CUTLASS 2.7 (NVIDIA#318)

CUTLASS 2.7

Mainloop fusion for GEMM: summation over A or B
Strided DGRAD (optimized iterators)
Half-precision GELU_taylor activation functions
Use these when accumulation and epilogue compute types are all cutlass::half_t
Tuning and bug fixes to fused GEMM + GEMM example
Support for smaller than 128b aligned Convolutions: see examples
Caching of results to accelerate Convolution unit tests
Can be enabled or disabled by running cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF
Corrections and bug fixes reported by the CUTLASS community
Thank you for filing these issues!

authored-by: Haicheng Wu [email protected], Manish Gupta [email protected], Dustyn Blasig [email protected], Andrew Kerr [email protected]

v2.6.1

Toggle v2.6.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
CUTLASS 2.6.1 - functional and performance enhancements to strided DG…

…RAD, fixes, and tuning

* cutlass 2.6 update

* remove debug prints

* cutlass 2.6.1 (minor update)

* Updated CHANGELOG.

* Minor edit to readme to indicate patch version.

* Minor edit to readme.

Co-authored-by:  Haicheng Wu <[email protected]>, Andrew Kerr <[email protected]>

v2.6.0

Toggle v2.6.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge pull request NVIDIA#308 from dongxiao92/patch-1

fix typo in doc

v2.5.0

Toggle v2.5.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Create PUBLICATIONS.md (NVIDIA#189)

v2.4.0

Toggle v2.4.0's commit message
cutlass 2.4 documentation only update

v2.3.0

Toggle v2.3.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge pull request NVIDIA#135 from NVIDIA/cutlass_2.3_final

CUTLASS 2.3.0

v2.2.0

Toggle v2.2.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (N…

…VIDIA#100)

- Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>.
- Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out
- Added test_examples target to build and test all CUTLASS examples
- Minor edits to documentation to point to GTC 2020 webinar

v2.1.0

Toggle v2.1.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
update tools/library/CMakeLists to require python 3.6 according to NV…

…IDIA#70 (NVIDIA#82)

NVIDIA#70 only updates the documentation. This commit reflects this bump in python version to the CMake configuration as well.