Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLIP Image Captioning GradCAM? #155

Open
gwyong opened this issue May 22, 2023 · 7 comments
Open

BLIP Image Captioning GradCAM? #155

gwyong opened this issue May 22, 2023 · 7 comments

Comments

@gwyong
Copy link

gwyong commented May 22, 2023

Hi, I used BlipForConditionalGeneration from transformers for image captioning.
I want to visualize the reason of generated caption (word by word) like GradCAM.

I found a code from Albef (https://github.com/salesforce/ALBEF/blob/main/visualization.ipynb), but it used an image-text matching model, not image captioning model.

Can you give me any hints or simple codes for this?

@LiJunnan1992
Copy link
Contributor

Hi, you can look at our code in LAVIS, which provides gradcam computation function for BLIP image-text matching model
https://github.com/salesforce/LAVIS/blob/a9939492f8f992d03088e7575bc711089b06544a/lavis/models/blip_models/blip_image_text_matching.py#L151

@gwyong
Copy link
Author

gwyong commented May 23, 2023

Does it mean, only image-text matching model can perform gradcam?
My model is image captioning model, (see this https://huggingface.co/docs/transformers/model_doc/blip#transformers.BlipForConditionalGeneration)

If it only supports image-text matching model, do I need to make another image-text matching model for gradcam?

@LiJunnan1992
Copy link
Contributor

You can adapt the gradcam code to work with an image captioning model.

@gwyong
Copy link
Author

gwyong commented May 23, 2023

Thank you I will try it.

@Michi-3000
Copy link

Hi, I am also working on the visualization that goes beyond the image-text matching model, and I've encountered some difficulties when calling 'attn_gradients' and 'attention_map'. Have you had any success with this and if so can you share the code or provide some guidance? Thank you very much!

@gwyong
Copy link
Author

gwyong commented May 23, 2023

Sure if I solve it, I will let you know.

@dip9811111
Copy link

Sure if I solve it, I will let you know.

Did you manage to solve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants