Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

Python 140 18 Updated Jul 27, 2024

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,392 491 Updated Jul 13, 2024

BestAnHongjun / LMDeploy-Jetson

Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function independently without continuous internet access.

57 5 Updated Mar 23, 2024

mlc-ai / notebooks

Jupyter Notebook 185 59 Updated May 15, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

10,883 722 Updated Jul 25, 2024

python-arq / arq

Fast job queuing and RPC in python with asyncio and redis.

Python 2,041 170 Updated Jul 28, 2024

lucidrains / titok-pytorch

Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"

Python 154 3 Updated Jun 20, 2024

toverainc / willow-inference-server

Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS

Python 354 31 Updated May 29, 2024

NVIDIA-AI-IOT / mmj_utils

A utility library to help integrate Python applications with Metropolis Microservices for Jetson

Python 4 1 Updated Jun 13, 2024

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 11,084 2,312 Updated Jul 28, 2024

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 17,938 1,425 Updated Jul 28, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,696 836 Updated Jul 28, 2024

unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Python 13,219 869 Updated Jul 28, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 1,414 51 Updated Jul 27, 2024

triton-lang / triton

Development repository for the Triton language and compiler

C++ 12,097 1,444 Updated Jul 28, 2024

RustPython / RustPython

A Python Interpreter written in Rust

Rust 18,104 1,215 Updated Jul 28, 2024

hemingkx / Spec-Bench

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Python 127 13 Updated May 29, 2024

KindXiaoming / pykan

Kolmogorov Arnold Networks

Jupyter Notebook 13,921 1,256 Updated Jul 28, 2024

PygmalionAI / aphrodite-engine

PygmalionAI's large-scale inference engine

Python 817 91 Updated Jul 28, 2024

huggingface / text-generation-inference

Large Language Model Text Generation Inference

Python 8,487 969 Updated Jul 26, 2024

hiyouga / LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 27,877 3,423 Updated Jul 26, 2024

ashokyannam / GPU_Acceleration_Using_CUDA_C_CPP

Programming accelerated applications with CUDA C/C++, enough to be able to begin work accelerating your own CPU-only applications for performance gains, and for moving into novel computational terr…

HTML 90 32 Updated May 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jewon Lee je1lee

Achievements

Achievements

Block or report je1lee

Stars

pytorch / torchchat

pytorch / executorch

AILab-CVC / SEED-X

Nota-NetsPresso / shortened-llm

microsoft / Samba

microsoft / onnxruntime-genai

pytorch / TensorRT

karpathy / LLM101n

dusty-nv / NanoLLM