Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use BLIP for near-duplicate image and text pair detection? #72

Open
smith-co opened this issue Jun 29, 2022 · 2 comments
Open

How to use BLIP for near-duplicate image and text pair detection? #72

smith-co opened this issue Jun 29, 2022 · 2 comments

Comments

@smith-co
Copy link

smith-co commented Jun 29, 2022

Given pairs of (image,text), I have to detect near duplicate using both features.

Pairs

(image1, text1)
(image2, text2)
...
(imageN, textN)

I am thinking to compute embedding using both image and text features:

multimodal_feature_1 = model(image1, text1, mode='multimodal')[0,0]
multimodal_feature_2 = model(image2, text2, mode='multimodal')[0,0]
matching_score = cosine_similarity(multimodal_feature_1, multimodal_feature_2)

Any feedback about this approach?

Also I would like to know is there a length limit for text that I should be aware of?

@LiJunnan1992
Copy link
Contributor

Hi, note that the multimodal feature has not been optimized for cosine-similarity. The unimodal features can be used to compute cosine-similarity because of the image-text contrastive loss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants