Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
-
Updated
Feb 13, 2024 - Python
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Exploring Visual Prompts for Adapting Large-Scale Models
[TPAMI] Searching prompt modules for parameter-efficient transfer learning.
[ICCV 2023] Binary Adapters, [AAAI 2023] FacT, [Tech report] Convpass
[NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"
Official implementation for CVPR'23 paper "BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning"
👀 Visual Instruction Inversion: Image Editing via Visual Prompting (NeurIPS 2023)
[CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval
[ICLR24] AutoVP: An Automated Visual Prompting Framework and Benchmark
[arXiv] "Uncovering the Hidden Cost of Model Compression" by Diganta Misra, Agam Goyal, Bharat Runwal, and Pin-Yu Chen
[ICML 2024] Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
A simple GUI for experimenting with visual prompting
These notes and resources are compiled from the crash course Prompt Engineering for Vision Models offered by DeepLearning.AI.
Add a description, image, and links to the visual-prompting topic page so that developers can more easily learn about it.
To associate your repository with the visual-prompting topic, visit your repo's landing page and select "manage topics."