KeAWang

Alex Wang KeAWang

CS PhD student at Stanford University; Cornell University BA, MS

52 followers · 2 following

Stanford, CA
https://keawang.github.io

Achievements

Highlights

Stars

10 stars written in Cuda

Clear filter

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 1,465 55 Updated Aug 31, 2024

NVIDIA / nv-wavenet

Reference implementation of real-time autoregressive wavenet inference

Cuda 735 126 Updated Jan 19, 2021

CoffeeBeforeArch / cuda_programming

Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch

Cuda 695 155 Updated Jul 19, 2023

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 398 52 Updated Dec 28, 2023

Dao-AILab / causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 270 53 Updated Aug 12, 2024

b0nes164 / GPUSorting

OneSweep, implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.

Cuda 115 5 Updated Aug 30, 2024

mark-poscablo / gpu-prefix-sum

CUDA implementation of exclusive prefix sum via Blelloch's algorithm

Cuda 23 11 Updated Jul 19, 2017

gungui98 / Pytorch-Depthwise-Conv3d

cuda implementation of depthwise conv3d

Cuda 21 4 Updated Jul 14, 2021

PabloEnfedaque / CUDA_DWT_RegisterBased

Cuda 5 Updated May 29, 2017

DeltaCube23 / Parallel-Computing

Implementation of various Parallel Computing algorithms using CUDA C++

Cuda 1 Updated May 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alex Wang KeAWang

Achievements

Achievements

Highlights

Block or report KeAWang

Stars

HazyResearch / ThunderKittens

NVIDIA / nv-wavenet

CoffeeBeforeArch / cuda_programming

siboehm / SGEMM_CUDA

Dao-AILab / causal-conv1d

b0nes164 / GPUSorting

mark-poscablo / gpu-prefix-sum

gungui98 / Pytorch-Depthwise-Conv3d

PabloEnfedaque / CUDA_DWT_RegisterBased

DeltaCube23 / Parallel-Computing