Lists (3)
Sort Name ascending (A-Z)
Stars
Factorization Vision Transformer: Modeling Long Range Dependency with Local Window Cost
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Official PyTorch Implementation of "Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models"
This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining
Official code for "Block Transformer: Global-to-Local Language Modeling for Fast Inference"
This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.
Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
simran-arora / cs229s-nanoGPT
Forked from karpathy/nanoGPTThe simplest, fastest repository for training/finetuning medium-sized GPTs.
GRASS: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
A simple minimal implementation of Reversible Vision Transformers
STAR: Scale-wise Text-to-image generation via Auto-Regressive representations
[BSQ-ViT] Image and Video Tokenization with Binary Spherical Quantization
EasyRobust: an Easy-to-use library for state-of-the-art Robust Computer Vision Research with PyTorch.
Bidirectional Autoregressive Talker from Generative Pre-trained Transformer
Tutorial for how to build BERT from scratch
Fast and memory-efficient exact attention
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
Vector (and Scalar) Quantization, in Pytorch
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagation operation to get super vision language performances. (Under Review)