Starred repositories
NVlabs / cub
Forked from NVIDIA/cubTHIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
C++ Implementation of PyTorch Tutorials for Everyone
This is a code repository for pytorch c++ (or libtorch) tutorial.
C++ library based on tensorrt integration
A curated list of awesome C++ (or C) frameworks, libraries, resources, and shiny things. Inspired by awesome-... stuff.
A list of awesome compiler projects and papers for tensor computation and deep learning.
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
Achieve peak performance on x86 CPUs and NVIDIA GPUs
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
Instructions, Docker images, and examples for Nsight Compute and Nsight Systems
A high-throughput and memory-efficient inference and serving engine for LLMs
Implementation of popular deep learning networks with TensorRT network definition API
Demonstration of various hardware effects on CUDA GPUs.
Demonstration of various hardware effects.
Thin, unified, C++-flavored wrappers for the CUDA APIs
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Source code examples from the Parallel Forall Blog