Starred repositories
Distributed vector search for AI-native applications
An efficient video loader for deep learning with smart shuffling that's super easy to digest
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
A curated list of foundation models for vision and language tasks
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
DOM to Semantic-Markdown for use with LLMs
Explore the Limits of Omni-modal Pretraining at Scale
SEED-Story: Multimodal Long Story Generation with Large Language Model
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of …
Streamlit file browser
Open-Sora: Democratizing Efficient Video Production for All
A Node for ComfyUI that does what you ask it to do
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
Official Code for Stable Cascade
A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the performance of the open-source model Qwen-VL-7B-Chat.
Generative Models by Stability AI
👾 Open source implementation of the ChatGPT Code Interpreter
Images to inference with no labeling (use foundation models to train supervised models).
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
[ICLR'24] Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching
understanding model mistakes with human annotations
Recent LLM-based CV and related works. Welcome to comment/contribute!