Skip to content
View Sakits's full-sized avatar
🌟
Stargazing
🌟
Stargazing

Highlights

  • Pro

Block or report Sakits

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A sparse attention kernel supporting mix sparse patterns

C++ 27 Updated Oct 11, 2024

TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Python 16 Updated Oct 9, 2024

A benchmark for testing memorization abilities of LMs

Python 4 1 Updated Oct 11, 2024

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Python 159 5 Updated Oct 11, 2024

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Python 62 2 Updated Oct 13, 2024

[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.

Python 67 3 Updated Oct 3, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,597 146 Updated Oct 4, 2024
JavaScript 2,431 857 Updated Jun 21, 2024

A multi-level tensor algebra superoptimizer

C++ 518 25 Updated Oct 14, 2024

pipreqs - Generate pip requirements.txt file based on imports of any project. Looking for maintainers to move this project forward.

Python 6,380 388 Updated Jul 6, 2024

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 248 9 Updated Aug 12, 2024

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Python 1,454 125 Updated Oct 14, 2024

PyTorch native quantization and sparsity for training and inference

Python 1,373 133 Updated Oct 14, 2024

[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

Python 19 2 Updated Sep 24, 2024

Model components of the Llama Stack APIs

Python 3,625 499 Updated Oct 14, 2024

Utilities intended for use with Llama models.

Python 4,427 777 Updated Oct 14, 2024

TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.

Python 1,664 139 Updated Oct 6, 2024

[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"

Python 342 18 Updated Aug 19, 2024

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)

Python 62 14 Updated Jul 10, 2024

Solve puzzles. Learn CUDA.

Jupyter Notebook 9,583 856 Updated Sep 1, 2024

Building blocks for foundation models.

372 13 Updated Jan 3, 2024

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

C++ 209 22 Updated Sep 30, 2024

[ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, Souvik Kundu, Zhangyang Wang

Python 15 2 Updated Jun 2, 2024

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

4,368 236 Updated Oct 12, 2024

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

Python 616 16 Updated Sep 18, 2024

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 701 35 Updated Oct 9, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,732 447 Updated Sep 19, 2024

Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models

Python 188 25 Updated Apr 23, 2024

awesome synthetic (text) datasets

Jupyter Notebook 229 11 Updated Oct 11, 2024

SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

Python 93 5 Updated Sep 14, 2024
Next