Skip to content
View oujieww's full-sized avatar
Block or Report

Block or report oujieww

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 28 Updated Feb 19, 2024
8 Updated Apr 7, 2024
Python 1 Updated Jul 5, 2024
Python 6 Updated Jun 4, 2024
Python 37 Updated May 13, 2024

Implementation of Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Python 35 4 Updated Jun 26, 2024

[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models

Python 59 5 Updated May 24, 2024

Awesome LLM compression research papers and tools.

922 54 Updated Jul 30, 2024

A collection of AWESOME things about mixture-of-experts

869 65 Updated Jul 20, 2024

10x faster matrix and vector operations

C++ 2,465 171 Updated Oct 12, 2022

PyTorch-UVM on super-large language models.

Python 13 4 Updated Dec 21, 2020

Library for faster pinned CPU <-> GPU transfer in Pytorch

Python 680 39 Updated Feb 21, 2020

PyTorch library for cost-effective, fast and easy serving of MoE models.

Python 77 5 Updated Jun 9, 2024
Python 100 4 Updated Jul 22, 2024

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Python 1,497 178 Updated Jan 15, 2024

Fast Inference of MoE Models with CPU-GPU Orchestration

Python 156 16 Updated May 22, 2024

Run Mixtral-8x7B models in Colab or consumer desktops

Python 2,278 223 Updated Apr 8, 2024

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 28,079 3,445 Updated Jul 30, 2024

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,061 134 Updated Jun 25, 2024

Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)

Python 55 4 Updated Feb 13, 2024

Instant voice cloning by MyShell.

Python 27,674 2,694 Updated Jul 23, 2024

The code of SGSLN

Python 67 5 Updated Mar 22, 2024
Python 109 11 Updated Jan 22, 2024

Evaluation Code repository for the paper "ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers". (2023 TMLR Submission)

Python 10 2 Updated Dec 5, 2023

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc.…

Python 2,232 260 Updated Jul 24, 2024

Official PyTorch implementation of QA-LoRA

Python 105 11 Updated Mar 13, 2024

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Python 2,563 262 Updated Jun 2, 2024

LongQLoRA: Extent Context Length of LLMs Efficiently

Python 152 12 Updated Nov 12, 2023
Next