-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use BLIP for near-duplicate image and text pair detection? #72
Comments
Hi, note that the multimodal feature has not been optimized for cosine-similarity. The unimodal features can be used to compute cosine-similarity because of the image-text contrastive loss. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Given pairs of (image,text), I have to detect near duplicate using both features.
Pairs
I am thinking to compute embedding using both image and text features:
Any feedback about this approach?
Also I would like to know is there a length limit for
text
that I should be aware of?The text was updated successfully, but these errors were encountered: