Stars
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
VPTQ, A Flexible and Extreme low-bit quantization algorithm
An extremely fast implementation of whisper optimized for Apple Silicon using MLX.
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
UE5's Nanite implementation using WebGPU. Includes the meshlet LOD hierarchy, software rasterizer and billboard impostors. Culling on both per-instance and per-meshlet basis.
Text-to-Music Generation with Rectified Flow Transformers
Skip transpiler for creating SwiftUI apps for iOS and Android
Virtual whiteboard for sketching hand-drawn like diagrams
Most modern mobile touch slider with hardware accelerated transitions
Fast parallel LLM inference for MLX
Official implementation of the paper "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"
This repo includes ChatGPT prompt curation to use ChatGPT better.
XR virtual workspace library for Linux
Small but complete dynamic Forth Interpreter/Compiler for and in WebAssembly
Experimental PinePhone distro to provide a heads-up display using the Nreal Air.
A project written in C++ to get hardware info on a Windows PC. Interfaces with the Windows Management Instrumentation (WMI) service to query hardware info of interest and provides a basic command l…
Read, search and get informations about linux disks and partitions.