Stars
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
SuperPrompt is an attempt to engineer prompts that might help us understand AI agents.
A lightweight library for PyTorch training tools and utilities
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
Long Context Transfer from Language to Vision
Accelerating the development of large multimodal models (LMMs) with lmms-eval
Open-TeleVision: Teleoperation with Immersive Active Visual Feedback
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
MINT-1T: A one trillion token multimodal interleaved dataset.
RecordRTC is WebRTC JavaScript library for audio/video as well as screen activity recording. It supports Chrome, Firefox, Opera, Android, and Microsoft Edge. Platforms: Linux, Mac and Windows.
Android ViewServer and ADB client
A Gradio web UI for Large Language Models.
A pytorch template for beginners based on pytorch_lightning
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Implementation of a Transformer, but completely in Triton
🛁 Clean Code concepts adapted for Python
Implement minimal boilerplate CLIs derived from type hints and parse from command line, config files and environment variables
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curatio…
LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities