chiakicage

🦀

rusting

Kaiqi Chen chiakicage

🦀

rusting

52 followers · 74 following

Achievements

Highlights

Lists (1)

Sort

🚀 My stack

1 repository

Beta Lists are currently in beta. Share feedback and report bugs.

Stars

microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 403 34 Updated Nov 5, 2024

windingwind / zotero-better-notes

Everything about note management. All in Zotero.

TypeScript 5,482 188 Updated Oct 26, 2024

zju-bmi-lab / SpikingGS

Python 45 2 Updated Oct 17, 2024

ruikangliu / FlatQuant

Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization

Python 55 4 Updated Oct 23, 2024

galeselee / DVFS_PaperList

Energy is a very noticable topic. Dynaimc Voltage and Frequency Scaling is a technique for CPU and GPU power consumption. Here is a paperlist of DVFS and power consumption.

2 Updated Oct 16, 2024

galeselee / Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…

165 6 Updated Nov 5, 2024

ml-energy / zeus

Deep Learning Energy Measurement and Optimization

Python 209 26 Updated Nov 5, 2024

pytorch / torchtitan

A native PyTorch Library for large model training

Python 2,566 199 Updated Nov 5, 2024

OpenBMB / BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models

Python 560 77 Updated Jul 22, 2024

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 10,479 2,348 Updated Nov 6, 2024

kwai / Megatron-Kwai

Forked from NVIDIA/Megatron-LM

[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism

Python 45 1 Updated Jul 31, 2024

astral-sh / uv

An extremely fast Python package and project manager, written in Rust.

Rust 25,416 739 Updated Nov 7, 2024

LMCache / LMCache

Ultra-Fast and Cheaper Long-Context LLM Inference

Python 191 22 Updated Nov 6, 2024

teslamotors / ttpoe

C 535 44 Updated Nov 4, 2024

LoongServe / LoongServe

Jupyter Notebook 41 5 Updated Sep 16, 2024

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 216 16 Updated Oct 30, 2024

HanGuo97 / flute

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

C++ 183 5 Updated Oct 6, 2024

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 627 24 Updated Sep 21, 2024

Ezio-csm / FasterTransformer

C++ 1 Updated Jul 7, 2024

pengsida / learning_research

本人的科研经验

5,874 345 Updated Nov 3, 2024

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 658 53 Updated Nov 7, 2024

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 729 37 Updated Nov 6, 2024

microsoft / MInference

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 776 36 Updated Nov 2, 2024

microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

C 222 14 Updated Nov 6, 2024

iamNCJ / DiLightNet

Official Code Release for [SIGGRAPH 2024] DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

Python 123 6 Updated Sep 9, 2024

ZiyaoLi / fast-kan

FastKAN: Very Fast Implementation of Kolmogorov-Arnold Networks (KAN)

Jupyter Notebook 365 47 Updated Jun 20, 2024

AthanasiosDelis / faster-kan

Forked from ZiyaoLi/fast-kan

Benchmarking and Testing FastKAN

Python 62 8 Updated May 26, 2024