-
MIT, EECS
- Cambridge, MA
-
17:00
(UTC -04:00) - jiamingtang.me
- @jmtang42
Highlights
- Pro
Stars
A sparse attention kernel supporting mix sparse patterns
TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
A benchmark for testing memorization abilities of LMs
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
pipreqs - Generate pip requirements.txt file based on imports of any project. Looking for maintainers to move this project forward.
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Accelerating the development of large multimodal models (LMMs) with lmms-eval
PyTorch native quantization and sparsity for training and inference
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
Model components of the Llama Stack APIs
Utilities intended for use with Llama models.
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
Building blocks for foundation models.
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
[ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, Souvik Kundu, Zhangyang Wang
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
awesome synthetic (text) datasets
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation