Skip to content
View wwbitejotunn's full-sized avatar
  • UESTC
  • Chengdu, Sichuan, China

Block or report wwbitejotunn

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Fast Hadamard transform in CUDA, with a PyTorch interface

C 97 14 Updated May 24, 2024

A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCL

Python 14 4 Updated Sep 16, 2024

FlagGems is an operator library for large language models implemented in Triton Language.

Python 285 26 Updated Oct 10, 2024

使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention

Cuda 48 3 Updated Aug 12, 2024

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718

Python 263 21 Updated Sep 25, 2024

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Python 839 43 Updated Oct 6, 2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Python 413 19 Updated Sep 5, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 5,526 412 Updated Oct 9, 2024

LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step

Python 403 40 Updated Sep 10, 2024

MambaOut: Do We Really Need Mamba for Vision?

Python 1,982 34 Updated Jun 6, 2024

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 366 29 Updated Oct 9, 2024

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Python 94 5 Updated May 15, 2024

SpeeD: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Python 153 5 Updated Jul 5, 2024

Awesome list for LLM pruning.

141 8 Updated Oct 10, 2024

利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

Python 16,416 2,606 Updated Jul 26, 2024

Code examples and resources for DBRX, a large language model developed by Databricks

Python 2,498 236 Updated May 1, 2024

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 11,313 1,009 Updated Oct 8, 2024
Python 15 1 Updated Feb 21, 2024

Fast Inference of MoE Models with CPU-GPU Orchestration

Python 167 16 Updated Sep 28, 2024

[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.

Python 1,601 105 Updated Aug 20, 2024

Library for specialized dense and sparse matrix operations, and deep learning primitives.

C 844 181 Updated Oct 9, 2024

Implementation of popular deep learning networks with TensorRT network definition API

C++ 6,926 1,765 Updated Oct 9, 2024

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,496 197 Updated Oct 10, 2024

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 814 128 Updated Jul 29, 2023

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

Python 9,536 685 Updated Jul 25, 2024

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 7,914 409 Updated Sep 6, 2024

A family of open-sourced Mixture-of-Experts (MoE) Large Language Models

Python 1,371 70 Updated Mar 8, 2024

Visual Studio Code

TypeScript 163,317 28,902 Updated Oct 10, 2024

CUDA Kernel Benchmarking Library

Cuda 492 63 Updated Jun 5, 2024
Next