[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 631 30 Updated Jun 2, 2024

hananshafi / llmblueprint

[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"

Jupyter Notebook 59 2 Updated May 18, 2024

magic-research / bubogpt

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

Python 485 34 Updated Jul 21, 2023

rese1f / MovieChat

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

Python 442 36 Updated Jun 16, 2024

marslanm / Multimodality-Representation-Learning

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://…

62 7 Updated Oct 19, 2023

asif-hanif / vafa

[MICCAI 2023] Official code repository of paper titled "Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation" accepted in MICCAI 2023 conference.

Python 45 Updated Nov 14, 2023

muzairkhattak / PromptSRC

[ICCV'23 Main Track, WECIA'23 Oral] Official repository of paper titled "Self-regulating Prompts: Foundational Model Adaptation without Forgetting".

Python 195 8 Updated Sep 28, 2023

mbzuai-oryx / ClimateGPT

[EMNLP'23] ClimateGPT: a specialized LLM for conversations related to Climate Change and Sustainability topics in both English and Arabic languages.

Python 71 9 Updated Jan 30, 2024

mbzuai-oryx / XrayGPT

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.

Python 436 51 Updated Aug 5, 2023

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…

Python 1,034 92 Updated Jun 16, 2024

Vision-CAIR / MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Python 25,105 2,897 Updated Apr 22, 2024

amazon-science / prompt-pretraining

Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"

Python 248 8 Updated May 3, 2024

Amshaker / SwiftFormer

[ICCV'23] Official repository of paper SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

Python 209 23 Updated Jan 12, 2024

demidovd98 / sm-vit

Official repository for the paper "Salient Mask-Guided Vision Transformer for Fine-Grained Classification" (VISIGRAPP '23)

Python 17 Updated Mar 6, 2023

facebookresearch / ConvNeXt-V2

Code release for ConvNeXt V2 model

Python 1,394 111 Updated Mar 4, 2024