- Seoul, Korea
- https://bebens.ee
Block or Report
Block or report qmdnls
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
llama3.np is a pure NumPy implementation for Llama 3 model.
Website for hosting the Open Foundation Models Cheat Sheet.
speedrun implementation of dl papers throughout history
Zero Bubble Pipeline Parallelism
Material for cuda-mode lectures
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
Ring attention implementation with flash attention
Transformers with Arbitrarily Large Context
A high-throughput and memory-efficient inference and serving engine for LLMs
A tiling window manager for macOS based on binary space partitioning
Official release of InternLM2.5 7B base and chat models. 1M context support
Doing simple retrieval from LLM models at various context lengths to measure accuracy
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
Development repository for the Triton language and compiler
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
Why Do We Need Weight Decay in Modern Deep Learning? [arXiv, Oct 2023]
Reformer, the efficient Transformer, in Pytorch
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.