Skip to content
View je1lee's full-sized avatar
🚀
Focusing
🚀
Focusing
Block or Report

Block or report je1lee

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Run PyTorch LLMs locally on servers, desktop and mobile

Python 44 2 Updated Jul 27, 2024

On-device AI across mobile, embedded and edge for PyTorch

C++ 1,473 244 Updated Jul 28, 2024

Multimodal Models in Real World

Jupyter Notebook 343 17 Updated Jul 12, 2024

Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]

Python 46 4 Updated Jul 11, 2024

Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"

Python 720 39 Updated Jul 11, 2024

Generative AI extensions for onnxruntime

C++ 348 80 Updated Jul 27, 2024

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Python 2,434 339 Updated Jul 28, 2024

LLM101n: Let's build a Storyteller

25,819 1,373 Updated Jul 21, 2024

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

Python 140 18 Updated Jul 27, 2024

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,392 491 Updated Jul 13, 2024

Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function independently without continuous internet access.

57 5 Updated Mar 23, 2024
Jupyter Notebook 185 59 Updated May 15, 2024

✨✨Latest Advances on Multimodal Large Language Models

10,883 722 Updated Jul 25, 2024

Fast job queuing and RPC in python with asyncio and redis.

Python 2,041 170 Updated Jul 28, 2024

Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"

Python 154 3 Updated Jun 20, 2024

Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS

Python 354 31 Updated May 29, 2024

A utility library to help integrate Python applications with Metropolis Microservices for Jetson

Python 4 1 Updated Jun 13, 2024

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 11,084 2,312 Updated Jul 28, 2024

Universal LLM Deployment Engine with ML Compilation

Python 17,938 1,425 Updated Jul 28, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,696 836 Updated Jul 28, 2024

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Python 13,219 869 Updated Jul 28, 2024

Tile primitives for speedy kernels

Cuda 1,414 51 Updated Jul 27, 2024

Development repository for the Triton language and compiler

C++ 12,097 1,444 Updated Jul 28, 2024

A Python Interpreter written in Rust

Rust 18,104 1,215 Updated Jul 28, 2024

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Python 127 13 Updated May 29, 2024

Kolmogorov Arnold Networks

Jupyter Notebook 13,921 1,256 Updated Jul 28, 2024

PygmalionAI's large-scale inference engine

Python 817 91 Updated Jul 28, 2024

Large Language Model Text Generation Inference

Python 8,487 969 Updated Jul 26, 2024

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 27,877 3,423 Updated Jul 26, 2024

Programming accelerated applications with CUDA C/C++, enough to be able to begin work accelerating your own CPU-only applications for performance gains, and for moving into novel computational terr…

HTML 90 32 Updated May 13, 2018
Next