oujieww

Follow

oujieww

Follow

5 followers · 17 following

Achievements

Achievements

Block or Report

Block or report oujieww

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Stars

zankner / Hydra

Python 28 Updated Feb 19, 2024

oujieww / ANPD

8 Updated Apr 7, 2024

oujieww / BFFN

Python 1 Updated Jul 5, 2024

VITA-Group / Q-Hitter

Python 6 Updated Jun 4, 2024

hdong920 / LESS

Python 37 Updated May 13, 2024

yileijin / Bootstrap-3D-GS

C++ 351 76 Updated May 5, 2024

Equationliu / Kangaroo

Implementation of Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Python 35 4 Updated Jun 26, 2024

Lucky-Lance / Expert_Sparsity

[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models

Python 59 5 Updated May 24, 2024

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

922 54 Updated Jul 30, 2024

XueFuzhao / awesome-mixture-of-experts

A collection of AWESOME things about mixture-of-experts

869 65 Updated Jul 20, 2024

dblalock / bolt

10x faster matrix and vector operations

C++ 2,465 171 Updated Oct 12, 2022

kooyunmo / cuda-uvm-gpt2

PyTorch-UVM on super-large language models.

Python 13 4 Updated Dec 21, 2020

Santosh-Gupta / SpeedTorch

Library for faster pinned CPU <-> GPU transfer in Pytorch

Python 680 39 Updated Feb 21, 2020

TorchMoE / MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

Python 77 5 Updated Jun 9, 2024

for-ai / parameter-efficient-moe

Python 233 14 Updated Oct 31, 2023

GCYZSL / MoLA

Python 100 4 Updated Jul 22, 2024

ali-vilab / dreamtalk

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Python 1,497 178 Updated Jan 15, 2024

efeslab / fiddler

Fast Inference of MoE Models with CPU-GPU Orchestration

Python 156 16 Updated May 22, 2024

dvmazur / mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

Python 2,278 223 Updated Apr 8, 2024

hiyouga / LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 28,079 3,445 Updated Jul 30, 2024

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,061 134 Updated Jun 25, 2024

DRSY / EasyKV

Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)

Python 55 4 Updated Feb 13, 2024

myshell-ai / OpenVoice

Instant voice cloning by MyShell.

Python 27,674 2,694 Updated Jul 23, 2024

NJU-LHRS / offical-SGSLN

The code of SGSLN

Python 67 5 Updated Mar 22, 2024

HanGuo97 / lq-lora

Python 109 11 Updated Jan 22, 2024

kuleshov-group / MODULoRA-Experiment

Evaluation Code repository for the paper "ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers". (2023 TMLR Submission)

Python 10 2 Updated Dec 5, 2023

xusenlinzy / api-for-open-llm

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc.…

Python 2,232 260 Updated Jul 24, 2024

yuhuixu1993 / qa-lora

Official PyTorch implementation of QA-LoRA

Python 105 11 Updated Mar 13, 2024

dvlab-research / LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Python 2,563 262 Updated Jun 2, 2024

yangjianxin1 / LongQLoRA

LongQLoRA: Extent Context Length of LLMs Efficiently

Python 152 12 Updated Nov 12, 2023