jeejeelee

🎯

Focusing

Jee Jee Li jeejeelee

🎯

Focusing

10 followers · 42 following

02:20 (UTC +08:00)

Achievements

Block or Report

Block or report jeejeelee

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Stars

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

930 19 Updated Jul 10, 2024

FlagOpen / FlagGems

FlagGems is an operator library for large language models implemented in Triton Language.

Python 184 10 Updated Jul 30, 2024

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 244 59 Updated Nov 7, 2023

microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 243 24 Updated Jul 30, 2024

qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Python 2,955 455 Updated Jul 13, 2024

sgl-project / sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Python 3,552 219 Updated Jul 30, 2024

Zj-BinXia / BBCU

This project is the official implementation of 'Basic Binary Convolution Unit for Binarized Image Restoration Network', ICLR2023

Python 117 3 Updated Oct 13, 2023

DefTruth / CUDA-Learn-Notes

🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Cuda 932 88 Updated Jul 29, 2024

KdaiP / StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Python 290 31 Updated Apr 17, 2024

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 1,003 74 Updated Jul 29, 2024

jeinlee1991 / chinese-llm-benchmark

中文大模型能力评测榜单：目前已囊括106个大模型，覆盖chatgpt、gpt4o、百度文心一言、阿里通义千问、讯飞星火、商汤senseChat、minimax等商用模型，以及百川、qwen2、glm4、yi、书生internLM2、llama3等开源大模型，多维度能力评测。不仅提供能力评分排行榜，也提供所有模型的原始输出结果！

1,997 98 Updated Jul 27, 2024

AmberLJC / LLMSys-PaperList

Large Language Model (LLM) Systems Paper List

516 23 Updated Jul 25, 2024

lllyasviel / LayerDiffuse

Transparent Image Layer Diffusion using Latent Transparency

1,935 23 Updated Jun 16, 2024

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

922 54 Updated Jul 30, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,094 143 Updated Jul 29, 2024

efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 229 15 Updated Jul 2, 2024

netease-youdao / EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Python 6,971 582 Updated Jul 12, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 895 83 Updated Jul 30, 2024

InstantID / InstantID

InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥

Python 10,624 768 Updated Jul 18, 2024

alipay / PainlessInferenceAcceleration

Python 272 17 Updated Jul 20, 2024

intelligent-machine-learning / glake

GLake: optimizing GPU memory management and IO transmission.

Python 327 31 Updated Jul 28, 2024

XPixelGroup / DiffBIR

Official codes of DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

Python 3,174 268 Updated Jul 3, 2024

MARD1NO / CUDA-PPT

67 11 Updated Jun 26, 2023

sunlex0717 / DissectingTensorCores

Cuda 69 15 Updated Apr 19, 2024

te42kyfo / gpu-benches

collection of benchmarks to measure basic GPU capabilities

Jupyter Notebook 182 29 Updated Jun 21, 2024

chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Python 1,071 60 Updated Jul 16, 2024

torchpipe / torchpipe

Serving Inside Pytorch With Multi-threads

C++ 137 12 Updated Jul 29, 2024

IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Python 5,829 614 Updated Jul 24, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,729 841 Updated Jul 30, 2024

FMInference / H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python 331 29 Updated Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jee Jee Li jeejeelee

Achievements

Achievements

Block or report jeejeelee

Stars

kvcache-ai / Mooncake

FlagOpen / FlagGems

Bruce-Lee-LY / cuda_hgemm

microsoft / BitBLAS

qwopqwop200 / GPTQ-for-LLaMa

sgl-project / sglang

Zj-BinXia / BBCU

DefTruth / CUDA-Learn-Notes

KdaiP / StableTTS

horseee / Awesome-Efficient-LLM

jeinlee1991 / chinese-llm-benchmark

AmberLJC / LLMSys-PaperList

lllyasviel / LayerDiffuse

HuangOwen / Awesome-LLM-Compression

DefTruth / Awesome-LLM-Inference

efeslab / Atom

netease-youdao / EmotiVoice

flashinfer-ai / flashinfer

InstantID / InstantID

alipay / PainlessInferenceAcceleration

intelligent-machine-learning / glake

XPixelGroup / DiffBIR

MARD1NO / CUDA-PPT

sunlex0717 / DissectingTensorCores

te42kyfo / gpu-benches

chengzeyi / stable-fast

torchpipe / torchpipe

IDEA-Research / GroundingDINO

NVIDIA / TensorRT-LLM

FMInference / H2O