Skip to content
View chiakicage's full-sized avatar
🦀
rusting
🦀
rusting

Highlights

  • Pro

Block or report chiakicage

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 403 34 Updated Nov 5, 2024

Everything about note management. All in Zotero.

TypeScript 5,482 188 Updated Oct 26, 2024
Python 45 2 Updated Oct 17, 2024

Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization

Python 55 4 Updated Oct 23, 2024

Energy is a very noticable topic. Dynaimc Voltage and Frequency Scaling is a technique for CPU and GPU power consumption. Here is a paperlist of DVFS and power consumption.

2 Updated Oct 16, 2024

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…

165 6 Updated Nov 5, 2024

Deep Learning Energy Measurement and Optimization

Python 209 26 Updated Nov 5, 2024

A native PyTorch Library for large model training

Python 2,566 199 Updated Nov 5, 2024

Efficient Training (including pre-training and fine-tuning) for Big Models

Python 560 77 Updated Jul 22, 2024

Ongoing research training transformer models at scale

Python 10,479 2,348 Updated Nov 6, 2024

[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism

Python 45 1 Updated Jul 31, 2024

An extremely fast Python package and project manager, written in Rust.

Rust 25,416 739 Updated Nov 7, 2024

Ultra-Fast and Cheaper Long-Context LLM Inference

Python 191 22 Updated Nov 6, 2024
C 535 44 Updated Nov 4, 2024
Jupyter Notebook 41 5 Updated Sep 16, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 216 16 Updated Oct 30, 2024

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

C++ 183 5 Updated Oct 6, 2024

A throughput-oriented high-performance serving framework for LLMs

Cuda 627 24 Updated Sep 21, 2024
C++ 1 Updated Jul 7, 2024

本人的科研经验

5,874 345 Updated Nov 3, 2024

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 658 53 Updated Nov 7, 2024

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 729 37 Updated Nov 6, 2024

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 776 36 Updated Nov 2, 2024

Dynamic Memory Management for Serving LLMs without PagedAttention

C 222 14 Updated Nov 6, 2024

Official Code Release for [SIGGRAPH 2024] DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

Python 123 6 Updated Sep 9, 2024

FastKAN: Very Fast Implementation of Kolmogorov-Arnold Networks (KAN)

Jupyter Notebook 365 47 Updated Jun 20, 2024

Benchmarking and Testing FastKAN

Python 62 8 Updated May 26, 2024

Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 1,027 58 Updated Jul 14, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

1,109 23 Updated Jul 31, 2024
C++ 1 Updated May 2, 2024
Next