Skip to content
View AndySong20's full-sized avatar
Block or Report

Block or report AndySong20

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 460 34 Updated Jul 10, 2024

This repository contains the experimental PyTorch native float8 training UX

Python 195 18 Updated Jul 17, 2024

Reference implementations of MLPerf™ training benchmarks

Python 1,575 548 Updated Jul 17, 2024

A Python framework for high performance GPU simulation and graphics

Python 3,858 213 Updated Jul 17, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,505 816 Updated Jul 17, 2024

vim配置

Vim Script 4,889 1,809 Updated Nov 16, 2023

视频音频生成字幕,生成srt文件。无需申请第三方API,本地实现音频转文本。基于Transformer的视频字幕生成框架。A GUI tool for generating subtitle from videos and generating srt files.

Python 725 151 Updated Feb 1, 2024

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 9,623 743 Updated May 19, 2024

CUDA Library Samples

Cuda 1,423 296 Updated Jul 16, 2024

Transformer related optimization, including BERT, GPT

C++ 5,662 877 Updated Mar 27, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 23,083 3,276 Updated Jul 17, 2024

Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.

Python 426 127 Updated Jul 17, 2024

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 10,954 2,285 Updated Jul 17, 2024

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 5,808 1,700 Updated Jul 17, 2024
Python 23 11 Updated Apr 15, 2023

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,655 262 Updated Jul 17, 2024

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 9,614 1,384 Updated Jul 17, 2024

Stable Diffusion web UI

Python 136,187 25,960 Updated Jul 16, 2024

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

C++ 6,496 2,112 Updated Jul 17, 2024

oneAPI Math Kernel Library (oneMKL) Interfaces

C++ 593 150 Updated Jul 17, 2024

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

C++ 373 70 Updated Jun 25, 2024

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 80,491 21,617 Updated Jul 17, 2024

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 8,046 568 Updated Jul 14, 2024

Development repository for the Triton language and compiler

C++ 12,007 1,427 Updated Jul 17, 2024

Fast and memory-efficient exact attention

Python 12,407 1,103 Updated Jul 15, 2024

A High Performance Metadata System for Kubernetes

Go 750 79 Updated May 13, 2024

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl

C++ 2,294 187 Updated Feb 7, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 4,925 846 Updated Jul 17, 2024

A library for efficient similarity search and clustering of dense vectors.

C++ 29,545 3,491 Updated Jul 17, 2024
Next