Skip to content
forked from ECNU-ICALK/BPT-VLM

[IJCAI 2023] Black-box Prompt Tuning for Vision-Language Model as a Service

Notifications You must be signed in to change notification settings

BruthYU/BPT-VLM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Black-box Prompt Tuning for Vision-Language Model as a Service

This repo contains the source code of our research project aiming to perform prompt tuning on vision-language models like CLIP in a derivative-free manner.

Updates

  • 2024/05/16: Repo has been transferred to ECNU-ICALK/BPT-VLM (Organization Account) 🔔
  • 2022/11/21: Support parallel evaluation for all indivisuals in the same generation. 🎨
  • 2022/10/20: Release the deep variant of prompt tuning for BPT-VLM. 🎊
  • 2022/09/28: Release the first version of BPT-VLM. ⭐

Table of Contents

Introduction

In the scenario of Model-as-a-Service (MaaS), large-scale pretrained models (PTMs) are usually released as inference APIs, users are allowed to query those PTMs with manually crafted prompts. It's tricky to conduct continuous prompt tuning on MaaS, especially for vision-language models (VLMs) in consideration of cross-modal interaction. BPT-VLM aims to optimize continuous visual and linguistic prompts for VLMs in a derivative-free manner. framework

Experiments

Algorithm MM-ES-Shallow MA-ES-Shallow CMA-ES-Shallow CMA-ES-Deep
ImageNet -- -- 65.08 64.84
SUN397 -- -- 68.01 69.83
caltech101 93.67 93.59 94.16 93.39
OxfordPets 90.49 90.57 90.43 90.62
StanfordCars 62.49 65.03 64.72 67.84
Food101 81.62 80.89 81.31 81.38
DTD 48.40 59.63 60.52 64.13
EuroSAT 86.25 86.93 86.11 89.37
UCF-101 70.76 76.34 74.62 76.66
Average -- -- 76.11 77.56

Prepare Environments

This code is built-on two open-source libraries PyCMA and PyPop7, so you need to install these two packages first.

pip install pycma pypop7

After that, run following commands to install other environments required by our project.

pip install torch==1.11.0+cu113
pip install pyyaml
pip install ftfy
pip install regex
pip install transformers
pip install overrides
pip install spacy

Prepare Datasets

Follow DATASET.md to install the datasets.

Quick Start

You can use our pretrained prompt tokens to perform classification on downstream datasets. Generally, a checkpoint directory is structured like this:

$RESULT/
├── caltech101
│   ├── caltech101_deep_cma_ViT-B-32.pth
│   └── caltech101_shallow_cma_ViT-B-32.pth

With dataset correctly installed, execute following commands to run a demo:

python demo.py --checkpoint_dir [$RESULT] --task_name caltech101 --opt shallow_cma --checkpoint_name caltech101
  • Make sure you correctly relate __dataset__ and __output__ in demo.py to the dataset and checkpoint directories.
  • Argument opt requires an algorithm name included in [shallow_cma, shallow_mmes, shallow_lmmaes, deep_cma].
  • Use the checkpoint tuned on checkpoint_name to perform evaluation on task_name dataset.

Black-box Prompt Tuning

To reproduce the results of black-box prompt tuning, make sure you correctly relate __dataset__ and __output__ in BBT_VL_Shallow.py (or in