Block or Report
Block or report Meteor168
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Official implementation of Half-Quadratic Quantization (HQQ)
Run Mixtral-8x7B models in Colab or consumer desktops
A framework for few-shot evaluation of language models.
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Running large language models on a single GPU for throughput-oriented scenarios.
TensorFlow code and pre-trained models for BERT
An annotated implementation of the Transformer paper.
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…
Official inference library for Mistral models
DNN quantization with outlier channel splitting
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
4 bits quantization of LLaMA using GPTQ
[CVPR 2023] Towards Any Structural Pruning; LLMs / SAM / Diffusion / Transformers / YOLOv8 / CNNs
A curated list for Efficient Large Language Models
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
deep learning for image processing including classification and object-detection etc.
The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Accessible large language models via k-bit quantization for PyTorch.
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
This repository contains integer operators on GPUs for PyTorch.