Skip to content
View jeejeelee's full-sized avatar
🎯
Focusing
🎯
Focusing
Block or Report

Block or report jeejeelee

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

930 19 Updated Jul 10, 2024

FlagGems is an operator library for large language models implemented in Triton Language.

Python 184 10 Updated Jul 30, 2024

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 244 59 Updated Nov 7, 2023

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 243 24 Updated Jul 30, 2024

4 bits quantization of LLaMA using GPTQ

Python 2,955 455 Updated Jul 13, 2024

SGLang is yet another fast serving framework for large language models and vision language models.

Python 3,552 219 Updated Jul 30, 2024

This project is the official implementation of 'Basic Binary Convolution Unit for Binarized Image Restoration Network', ICLR2023

Python 117 3 Updated Oct 13, 2023

🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Cuda 932 88 Updated Jul 29, 2024

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Python 290 31 Updated Apr 17, 2024

A curated list for Efficient Large Language Models

Python 1,003 74 Updated Jul 29, 2024

中文大模型能力评测榜单:目前已囊括106个大模型,覆盖chatgpt、gpt4o、百度文心一言、阿里通义千问、讯飞星火、商汤senseChat、minimax等商用模型, 以及百川、qwen2、glm4、yi、书生internLM2、llama3等开源大模型,多维度能力评测。不仅提供能力评分排行榜,也提供所有模型的原始输出结果!

1,997 98 Updated Jul 27, 2024

Large Language Model (LLM) Systems Paper List

516 23 Updated Jul 25, 2024

Transparent Image Layer Diffusion using Latent Transparency

1,935 23 Updated Jun 16, 2024

Awesome LLM compression research papers and tools.

922 54 Updated Jul 30, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,094 143 Updated Jul 29, 2024

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 229 15 Updated Jul 2, 2024

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Python 6,971 582 Updated Jul 12, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 895 83 Updated Jul 30, 2024

InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥

Python 10,624 768 Updated Jul 18, 2024

GLake: optimizing GPU memory management and IO transmission.

Python 327 31 Updated Jul 28, 2024

Official codes of DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

Python 3,174 268 Updated Jul 3, 2024
67 11 Updated Jun 26, 2023

collection of benchmarks to measure basic GPU capabilities

Jupyter Notebook 182 29 Updated Jun 21, 2024

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Python 1,071 60 Updated Jul 16, 2024

Serving Inside Pytorch With Multi-threads

C++ 137 12 Updated Jul 29, 2024

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Python 5,829 614 Updated Jul 24, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,729 841 Updated Jul 30, 2024

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python 331 29 Updated Jul 26, 2024
Next