Skip to content
View rainyBJ's full-sized avatar
💭
Machine Learning
💭
Machine Learning
  • BUPT
  • Beijing Haidian BUPT
Block or Report

Block or report rainyBJ

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 313 36 Updated May 28, 2024

RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

10 1 Updated Jul 9, 2024
17 Updated Jul 18, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Python 62 2 Updated Jul 22, 2024

LLM101n: Let's build a Storyteller

25,168 1,331 Updated Jul 21, 2024

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 34,626 3,631 Updated Jul 16, 2024

AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目

1,263 123 Updated Jul 15, 2024

中国大模型

4,935 422 Updated Jun 7, 2024

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Python 472 23 Updated Jul 6, 2024

An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites

4,460 484 Updated Jul 11, 2024

This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit"

Python 128 14 Updated Jul 23, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

906 17 Updated Jul 10, 2024

🎉CUDA&C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Cuda 908 86 Updated Jul 24, 2024

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 9,688 1,395 Updated Jul 17, 2024

A Easy-to-understand TensorOp Matmul Tutorial

C++ 228 23 Updated Jun 15, 2024

Continual Learning of Large Language Models: A Comprehensive Survey

170 12 Updated Jul 2, 2024

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python 329 28 Updated Jun 18, 2024

Triton Implementation of HyperAttention Algorithm

Python 45 1 Updated Dec 11, 2023
Python 29 2 Updated Mar 26, 2024

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

Python 46 4 Updated Jul 15, 2024

[ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"

Python 62 9 Updated Jun 6, 2024

Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"

Python 6 1 Updated Jun 26, 2024

mi-optimize is a versatile tool designed for the quantization and evaluation of large language models (LLMs). The library's seamless integration of various quantization methods and evaluation techn…

Python 10 2 Updated Jul 23, 2024

🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.

Python 124 34 Updated Jul 23, 2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Python 353 11 Updated Jul 19, 2024

Run generative AI models in sophgo BM1684X

Python 79 14 Updated Jul 23, 2024

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 32,204 5,487 Updated Jul 24, 2024

A quantization algorithm for LLM

Cuda 87 5 Updated Jun 21, 2024

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

3,116 118 Updated Jun 26, 2024

List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.

32 3 Updated Jun 2, 2024
Next