How to use BLIP for near-duplicate image and text pair detection? #72

smith-co · 2022-06-29T19:44:43Z

Given pairs of (image,text), I have to detect near duplicate using both features.

Pairs

(image1, text1)
(image2, text2)
...
(imageN, textN)

I am thinking to compute embedding using both image and text features:

multimodal_feature_1 = model(image1, text1, mode='multimodal')[0,0]
multimodal_feature_2 = model(image2, text2, mode='multimodal')[0,0]
matching_score = cosine_similarity(multimodal_feature_1, multimodal_feature_2)

Any feedback about this approach?

Also I would like to know is there a length limit for text that I should be aware of?

The text was updated successfully, but these errors were encountered:

woctezuma · 2022-06-29T21:25:49Z

LiJunnan1992 · 2022-07-05T09:33:42Z

Hi, note that the multimodal feature has not been optimized for cosine-similarity. The unimodal features can be used to compute cosine-similarity because of the image-text contrastive loss.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use BLIP for near-duplicate image and text pair detection? #72

How to use BLIP for near-duplicate image and text pair detection? #72

smith-co commented Jun 29, 2022 •

edited

Loading

woctezuma commented Jun 29, 2022

LiJunnan1992 commented Jul 5, 2022

How to use BLIP for near-duplicate image and text pair detection? #72

How to use BLIP for near-duplicate image and text pair detection? #72

Comments

smith-co commented Jun 29, 2022 • edited Loading

woctezuma commented Jun 29, 2022

LiJunnan1992 commented Jul 5, 2022

smith-co commented Jun 29, 2022 •

edited

Loading