-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Larger model #6
Comments
Hi, we have released a larger model which use ViT-L as the vision encoder (the text encoder is still bert-base). Currently we do not have plans to train models that are larger than that. Thanks! |
In case you change your mind, we from LAION can provide compute & have 6B yet unreleased image-text-pairs, 2.3B english. We are currently busy with preparing the training of CLIP-versions, but we could just scale the ViT & LM up with the existing code and cooperate on pulling off the training. Btw, here is a colab with pretty impressive captioning results i got with BLIP with many cannidate captions and filtering with CLIP ViT L & ResNet 50x64 https://colab.research.google.com/drive/1fKxiDMa-9uu1A6XiYjxTbYxSagvbZ8Fb?usp=sharing |
Hi @christophschuhmann, it would be great if we can cooperate to train larger BLIP models with our code and your data & compute. I am very interested to continue this discussion. Thanks for the colab, the captions do look nice! |
Awesome! :) We mostly use discord for correspondence. My handle is: spirit-from-germany#1488 Here is an invite link to the server we work on: For the Image captioning and VQA stuff, we use the channel #image-captioning. Let's chat there :) Btw, here are some VQA results we recently got with a frozen CLIP ViT L 14 and a frozen GPT J and a trained mapping transformer in between: |
Awesome work, thanks for releasing!
Is there some plans to further release larger models, such as BLIP-large or BLIP-xxlarge?
The text was updated successfully, but these errors were encountered: