Skip to content
View KeAWang's full-sized avatar

Highlights

  • Pro

Block or report KeAWang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
10 stars written in Cuda
Clear filter

Tile primitives for speedy kernels

Cuda 1,465 55 Updated Aug 31, 2024

Reference implementation of real-time autoregressive wavenet inference

Cuda 735 126 Updated Jan 19, 2021

Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch

Cuda 695 155 Updated Jul 19, 2023

Fast CUDA matrix multiplication from scratch

Cuda 398 52 Updated Dec 28, 2023

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 270 53 Updated Aug 12, 2024

OneSweep, implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.

Cuda 115 5 Updated Aug 30, 2024

CUDA implementation of exclusive prefix sum via Blelloch's algorithm

Cuda 23 11 Updated Jul 19, 2017

cuda implementation of depthwise conv3d

Cuda 21 4 Updated Jul 14, 2021

Implementation of various Parallel Computing algorithms using CUDA C++

Cuda 1 Updated May 29, 2021