Skip to content
View Meteor168's full-sized avatar
Block or Report

Block or report Meteor168

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

694 28 Updated Aug 3, 2024

Mybatis通用分页插件

Java 12,148 3,129 Updated Aug 7, 2024

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 940 43 Updated Jan 16, 2024

Official implementation of Half-Quadratic Quantization (HQQ)

Python 600 59 Updated Jul 30, 2024

Run Mixtral-8x7B models in Colab or consumer desktops

Python 2,277 224 Updated Apr 8, 2024

A framework for few-shot evaluation of language models.

Python 6,098 1,615 Updated Aug 9, 2024

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,213 441 Updated Aug 9, 2024

[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models

Python 59 6 Updated May 24, 2024

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 10,026 638 Updated May 2, 2024
Jupyter Notebook 45 9 Updated Jul 28, 2024

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Python 673 86 Updated May 30, 2024

A simple and effective LLM pruning approach.

Python 589 69 Updated Aug 9, 2024

Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

Python 28 2 Updated Apr 9, 2024

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,100 533 Updated Jul 24, 2024

TensorFlow code and pre-trained models for BERT

Python 37,633 9,547 Updated Jul 23, 2024

An annotated implementation of the Transformer paper.

Jupyter Notebook 5,460 1,184 Updated Apr 7, 2024

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…

TypeScript 30,515 5,346 Updated Aug 9, 2024

Official inference library for Mistral models

Jupyter Notebook 9,404 824 Updated Aug 8, 2024
Python 254 31 Updated Apr 2, 2024

DNN quantization with outlier channel splitting

Python 109 18 Updated Mar 21, 2020

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

Python 1,925 238 Updated Jul 31, 2024

4 bits quantization of LLaMA using GPTQ

Python 2,964 457 Updated Jul 13, 2024

[CVPR 2023] Towards Any Structural Pruning; LLMs / SAM / Diffusion / Transformers / YOLOv8 / CNNs

Python 2,505 312 Updated Aug 9, 2024

A curated list for Efficient Large Language Models

Python 1,029 74 Updated Aug 9, 2024

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 1,829 146 Updated Mar 27, 2024

deep learning for image processing including classification and object-detection etc.

Python 22,060 7,866 Updated Jul 25, 2024

The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models

Python 45 4 Updated Oct 5, 2022

Accessible large language models via k-bit quantization for PyTorch.

Python 5,868 594 Updated Aug 7, 2024

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,227 164 Updated Jul 16, 2024

This repository contains integer operators on GPUs for PyTorch.

Python 163 48 Updated Sep 29, 2023
Next