PachecoRJ

Follow

Pacheco PachecoRJ

Follow

Stars

VainF / Torch-Pruning

[CVPR 2023] Towards Any Structural Pruning; LLMs / SAM / Diffusion / Transformers / YOLOv8 / CNNs

Python 2,633 329 Updated Sep 30, 2024

mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

C++ 713 68 Updated Jul 4, 2024

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,380 184 Updated Jul 16, 2024

google-research / distilling-step-by-step

Python 412 57 Updated Sep 7, 2023

VITA-Group / AsViT

[ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou

Python 76 4 Updated Feb 21, 2022

VITA-Group / Random_Pruning

[ICLR 2022] The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training by Shiwei Liu, Tianlong Chen, Xiaohan Chen, Li Shen, Decebal Constantin Mocanu, Z…

Python 71 10 Updated Jan 9, 2023

sdc17 / UPop

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

Python 95 5 Updated Nov 4, 2023

AlibabaResearch / flash-llm

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 171 14 Updated Sep 24, 2023

FranxYao / FlanT5-CoT-Specialization

Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.

Jupyter Notebook 121 3 Updated Jun 18, 2023

tianyic / only_train_once_personal_footprint

OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM

Python 288 46 Updated Sep 16, 2024

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,227 153 Updated Jun 25, 2024

insuhan / hyper-attn

Python 68 9 Updated Dec 1, 2023

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,583 363 Updated Jul 11, 2024

zhengzangw / Sequence-Scheduling

PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".

Python 71 15 Updated May 23, 2023

sdc17 / CrossGET

[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.

24 Updated Oct 4, 2023

xlang-ai / batch-prompting

[EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.

Python 65 5 Updated Mar 8, 2024

YJiangcm / Lion

Code for "Lion: Adversarial Distillation of Proprietary Large Language Models (EMNLP 2023)"

Python 198 19 Updated Feb 11, 2024

luuyin / OWL

Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"

Python 49 7 Updated Jun 26, 2024

locuslab / wanda

A simple and effective LLM pruning approach.

Python 623 81 Updated Aug 9, 2024

intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,121 209 Updated Sep 26, 2024

SqueezeAILab / SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Python 632 42 Updated Aug 13, 2024

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 1,889 151 Updated Mar 27, 2024

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Python 6,116 611 Updated Oct 1, 2024

AIoT-MLSys-Lab / Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

970 82 Updated Sep 28, 2024

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

1,088 66 Updated Sep 30, 2024