Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Everything about note management. All in Zotero.
Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization
Energy is a very noticable topic. Dynaimc Voltage and Frequency Scaling is a technique for CPU and GPU power consumption. Here is a paperlist of DVFS and power consumption.
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…
Deep Learning Energy Measurement and Optimization
A native PyTorch Library for large model training
Efficient Training (including pre-training and fine-tuning) for Big Models
Ongoing research training transformer models at scale
kwai / Megatron-Kwai
Forked from NVIDIA/Megatron-LM[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism
An extremely fast Python package and project manager, written in Rust.
Ultra-Fast and Cheaper Long-Context LLM Inference
A fast communication-overlapping library for tensor parallelism on GPUs.
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
A throughput-oriented high-performance serving framework for LLMs
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…
Dynamic Memory Management for Serving LLMs without PagedAttention
Official Code Release for [SIGGRAPH 2024] DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
FastKAN: Very Fast Implementation of Kolmogorov-Arnold Networks (KAN)
AthanasiosDelis / faster-kan
Forked from ZiyaoLi/fast-kanBenchmarking and Testing FastKAN
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.