Stars
Example fasthtml applications demonstrating a range of web programming techniques
The fastest way to create an HTML app
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
kwai / Megatron-Kwai
Forked from NVIDIA/Megatron-LM[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism
A collection of memory efficient attention operators implemented in the Triton language.
Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
SGLang is a fast serving framework for large language models and vision language models.
FlagGems is an operator library for large language models implemented in Triton Language.
Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"
A generative speech model for daily dialogue.
Causal depthwise conv1d in CUDA, with a PyTorch interface
Mora: More like Sora for Generalist Video Generation
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"