BLIP Image Captioning GradCAM? #155

gwyong · 2023-05-22T05:26:11Z

Hi, I used BlipForConditionalGeneration from transformers for image captioning.
I want to visualize the reason of generated caption (word by word) like GradCAM.

I found a code from Albef (https://github.com/salesforce/ALBEF/blob/main/visualization.ipynb), but it used an image-text matching model, not image captioning model.

Can you give me any hints or simple codes for this?

LiJunnan1992 · 2023-05-23T02:37:10Z

Hi, you can look at our code in LAVIS, which provides gradcam computation function for BLIP image-text matching model
https://github.com/salesforce/LAVIS/blob/a9939492f8f992d03088e7575bc711089b06544a/lavis/models/blip_models/blip_image_text_matching.py#L151

gwyong · 2023-05-23T02:57:00Z

Does it mean, only image-text matching model can perform gradcam?
My model is image captioning model, (see this https://huggingface.co/docs/transformers/model_doc/blip#transformers.BlipForConditionalGeneration)

If it only supports image-text matching model, do I need to make another image-text matching model for gradcam?

LiJunnan1992 · 2023-05-23T03:20:42Z

You can adapt the gradcam code to work with an image captioning model.

gwyong · 2023-05-23T04:42:16Z

Thank you I will try it.

Michi-3000 · 2023-05-23T16:12:15Z

Hi, I am also working on the visualization that goes beyond the image-text matching model, and I've encountered some difficulties when calling 'attn_gradients' and 'attention_map'. Have you had any success with this and if so can you share the code or provide some guidance? Thank you very much!

gwyong · 2023-05-23T19:25:21Z

Sure if I solve it, I will let you know.

dip9811111 · 2023-10-03T09:09:00Z

Sure if I solve it, I will let you know.

Did you manage to solve this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLIP Image Captioning GradCAM? #155

BLIP Image Captioning GradCAM? #155

gwyong commented May 22, 2023

LiJunnan1992 commented May 23, 2023

gwyong commented May 23, 2023

LiJunnan1992 commented May 23, 2023

gwyong commented May 23, 2023

Michi-3000 commented May 23, 2023

gwyong commented May 23, 2023

dip9811111 commented Oct 3, 2023

BLIP Image Captioning GradCAM? #155

BLIP Image Captioning GradCAM? #155

Comments

gwyong commented May 22, 2023

LiJunnan1992 commented May 23, 2023

gwyong commented May 23, 2023

LiJunnan1992 commented May 23, 2023

gwyong commented May 23, 2023

Michi-3000 commented May 23, 2023

gwyong commented May 23, 2023

dip9811111 commented Oct 3, 2023