Stars
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
Research and Materials on Hardware implementation of Transformer Model
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
SGLang is a fast serving framework for large language models and vision language models.
Measures the latency between CPU cores
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
Universal LLM Deployment Engine with ML Compilation
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
⚡ A Fast, Extensible Progress Bar for Python and CLI
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry lead…
Mirror of https://chromium.googlesource.com/chromiumos/third_party/adhd
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
A reference client implementation for the playback of MPEG DASH via Javascript and compliant browsers.
A book teaching assembly language programming on the ARM 64 bit ISA. Along the way, good programming practices and insights into code development are offered which apply directly to higher level la…
OpenAL Soft is a software implementation of the OpenAL 3D audio API.
Train to 94% on CIFAR-10 in <6.3 seconds on a single A100. Or ~95.79% in ~110 seconds (or less!)
Japanese morphological analysis engine written in pure Python