-
UESTC
- Chengdu, Sichuan, China
Stars
Fast Hadamard transform in CUDA, with a PyTorch interface
A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCL
FlagGems is an operator library for large language models implemented in Triton Language.
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
SGLang is a fast serving framework for large language models and vision language models.
LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step
MambaOut: Do We Really Need Mamba for Vision?
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
SpeeD: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
Code examples and resources for DBRX, a large language model developed by Databricks
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Fast Inference of MoE Models with CPU-GPU Orchestration
[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.
Library for specialized dense and sparse matrix operations, and deep learning primitives.
Implementation of popular deep learning networks with TensorRT network definition API
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models