Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is there no model checkpoint that perform ITC+ITM+LM Loss on Coco/Flickr? #132

Closed
linzhiqiu opened this issue Feb 28, 2023 · 4 comments

Comments

@linzhiqiu
Copy link

I am curious why BLIP does not choose to apply all 3 losses when finetuning on COCO/Flickr. I would have thought that using all 3 losses will produce a model that can simultaneously perform both retrieval and captioning (on COCO/Flickr). Let me know if this is an mis-understanding!

@Pppapaya
Copy link

I am curious why BLIP does not choose to apply all 3 losses when finetuning on COCO/Flickr. I would have thought that using all 3 losses will produce a model that can simultaneously perform both retrieval and captioning (on COCO/Flickr). Let me know if this is an mis-understanding!

Hi, I'm curious about it too, do you know the reason now?

@linzhiqiu
Copy link
Author

I think BLIP-2 answers this question already -- BLIP-2 trained with 3 losses and got improved performance

@gwyong
Copy link

gwyong commented May 26, 2023

What does it mean? I couldn't understand. Does it mean BLIP was trained with three losses separately or trained together? I think it only used LM loss for fine tuning in image captioning,

@BrianG13
Copy link

BrianG13 commented Aug 7, 2023

@linzhiqiu any progress on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants