rainyBJ

Follow

💭

Machine Learning

Yiqian He rainyBJ

💭

Machine Learning

Follow

BUPT AI Master; BJTU EE Bachelor

10 followers · 76 following

BUPT
Beijing Haidian BUPT

Block or Report

Block or report rainyBJ

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Stars

cli99 / llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 313 36 Updated May 28, 2024

HuangOwen / RoLoRA

RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

10 1 Updated Jul 9, 2024

NolanoOrg / SpectraSuite

17 Updated Jul 18, 2024

OpenGVLab / EfficientQAT

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Python 62 2 Updated Jul 22, 2024

karpathy / LLM101n

LLM101n: Let's build a Storyteller

25,168 1,331 Updated Jul 21, 2024

mlabonne / llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 34,626 3,631 Updated Jul 16, 2024

315386775 / DeepLearing-Interview-Awesome-2024

AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓，同时包含工作和科研过程中的新想法、新问题、新资源与新项目

1,263 123 Updated Jul 15, 2024

wgwang / awesome-LLMs-In-China

中国大模型

4,935 422 Updated Jun 7, 2024

NVlabs / DoRA

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Python 472 23 Updated Jul 6, 2024

cmhungsteve / Awesome-Transformer-Attention

An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites

4,460 484 Updated Jul 11, 2024

ModelTC / llmc

This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit"

Python 128 14 Updated Jul 23, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

906 17 Updated Jul 10, 2024

DefTruth / CUDA-Learn-Notes

🎉CUDA&C++ 笔记 / 大模型手撕CUDA / 技术博客，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Cuda 908 86 Updated Jul 24, 2024

chenzomi12 / AISystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 9,688 1,395 Updated Jul 17, 2024

KnowingNothing / MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

C++ 228 23 Updated Jun 15, 2024

Wang-ML-Lab / llm-continual-learning-survey

Continual Learning of Large Language Models: A Comprehensive Survey

170 12 Updated Jul 2, 2024

FMInference / H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python 329 28 Updated Jun 18, 2024

amirzandieh / HyperAttention

Triton Implementation of HyperAttention Algorithm

Python 45 1 Updated Dec 11, 2023

d-matrix-ai / keyformer-llm

Python 29 2 Updated Mar 26, 2024

horseee / learning-to-cache

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

Python 46 4 Updated Jul 15, 2024

UNITES-Lab / MC-SMoE

[ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"

Python 62 9 Updated Jun 6, 2024

UNITES-Lab / moe-quantization

Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"

Python 6 1 Updated Jun 26, 2024

TsingmaoAI / MI-optimize

mi-optimize is a versatile tool designed for the quantization and evaluation of large language models (LLMs). The library's seamless integration of various quantization methods and evaluation techn…

Python 10 2 Updated Jul 23, 2024

foundation-model-stack / foundation-model-stack

🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.

Python 124 34 Updated Jul 23, 2024

mit-han-lab / qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Python 353 11 Updated Jul 19, 2024

sophgo / LLM-TPU

Run generative AI models in sophgo BM1684X

Python 79 14 Updated Jul 23, 2024

ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 32,204 5,487 Updated Jul 24, 2024

bytedance / decoupleQ

A quantization algorithm for LLM

Cuda 87 5 Updated Jun 21, 2024

deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

3,116 118 Updated Jun 26, 2024

DD-DuDa / awesome-vit-quantization-acceleration

List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.

32 3 Updated Jun 2, 2024