Stars
[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"
LAVIS - A One-stop Library for Language-Vision Intelligence
Implementation of "Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention"
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull request…
[COLM 2024] A Survey on Deep Learning for Theorem Proving
Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis
Code implementation of our NeurIPS 2023 paper: Vocabulary-free Image Classification
A curated list of papers, datasets and resources pertaining to open vocabulary object detection.
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Official PyTorch implementation Source code for LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation, accepted at CVPR 2024
[NIPS2023] This is an official implementation of paper "DAC-DETR: Divide the Attention Layers and Conquer".
Official implementaion of the pape: "A Causal Inspired Early-Branching Structure for Domain Generalization".
Accelerating the development of large multimodal models (LMMs) with lmms-eval
[CVPR 2024] Official repo for "InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model".
👩🏿💻👨🏾💻👩🏼💻👨🏽💻👩🏻💻中国独立开发者项目列表 -- 分享大家都在做什么
《明日方舟》小助手,全日常一键长草!| A one-click tool for the daily tasks of Arknights, supporting all clients.
Image augmentation for machine learning experiments.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Instruct-tune LLaMA on consumer hardware
Official implementation for "Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training" https://arxiv.org/abs/2211.11138
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, B…
[NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"
A list of papers that studies Novel Class Discovery
[CVPR 2022 Oral] Official implementation of DN-DETR