Block or Report
Block or report AndySong20
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
This repository contains the experimental PyTorch native float8 training UX
Reference implementations of MLPerf™ training benchmarks
A Python framework for high performance GPU simulation and graphics
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
视频音频生成字幕,生成srt文件。无需申请第三方API,本地实现音频转文本。基于Transformer的视频字幕生成框架。A GUI tool for generating subtitle from videos and generating srt files.
The official GitHub page for the survey paper "A Survey of Large Language Models".
Transformer related optimization, including BERT, GPT
A high-throughput and memory-efficient inference and serving engine for LLMs
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Stable Diffusion web UI
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Hackable and optimized Transformers building blocks, supporting a composable construction.
Development repository for the Triton language and compiler
Fast and memory-efficient exact attention
A High Performance Metadata System for Kubernetes
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
A library for efficient similarity search and clustering of dense vectors.