Stars
Language
Sort by: Recently starred
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
assistant tools for attention visualization in deep learning
Firefox user.js for speed, privacy, and security. Your favorite browser, but better.
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
YOLOv10: Real-Time End-to-End Object Detection
API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
A curated list of trustworthy Generative AI papers. Daily updating...
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
Open-Sora: Democratizing Efficient Video Production for All
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
[CVPR2024] Official Pytorch Implementation of SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation.
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
Generative Representational Instruction Tuning
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
A collection of visual instruction tuning datasets.
Recent LLM-based CV and related works. Welcome to comment/contribute!
This repository is for the first comprehensive survey on Meta AI's Segment Anything Model (SAM).