How to use BLIP for duplicate or near-duplicate images? #68

smith-co · 2022-06-29T02:16:07Z

Given pair of images, my use case is to detect whether they are duplicate or not.

(imageX, imageY) = verdict/score
verdict = duplicate/not duplicate/near duplicate

How can I use BLIP for this use case?

LiJunnan1992 · 2022-06-29T03:59:28Z

You can compute the cosine similarity of their image embeddings

woctezuma · 2022-06-29T06:35:41Z

For reference, there are basic tools to find duplicates: https://github.com/idealo/imagededup

smith-co · 2022-06-29T06:44:37Z

You can compute the cosine similarity of their image embeddings

@LiJunnan1992 do you have a example how to extract image embeddings?

I don't find any example here https://github.com/salesforce/BLIP/blob/main/demo.ipynb

LiJunnan1992 · 2022-06-29T08:37:13Z

Please refer to this code in the demo: image_feature = model(image, caption, mode='image')[0,0]

smith-co · 2022-06-29T19:04:45Z

@LiJunnan1992 as I mentioned for my case is give two images, I have to detect whether they are duplicate or not.

For this I have to get the embeddings of two images and then compute the cosine similarity.

But the code sample in the demo also have caption involved:

image_feature = model(image, caption, mode='image')[0,0]

As I mention I only want to get an embedding given an image. Is it possible with this model?

nashid · 2022-06-29T20:15:03Z

@smith-co I tested with the following:

from models.blip import blip_feature_extractor
from torch import nn
image_size = 224
image = load_demo_image(image_size=image_size, device=device)     

model_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base.pth'
    
model = blip_feature_extractor(pretrained=model_url, image_size=image_size, vit='base')
model.eval()
model = model.to(device)

caption = 'a woman sitting on the beach with a dog'

multimodal_feature = model(image, caption, mode='multimodal')[0,0]
image_feature = model(image, '', mode='image')[0,0]
text_feature = model(image, caption, mode='text')[0,0]

image_with_caption = model(image, caption, mode='image')[0,0]
image_without_caption = model(image, '', mode='image')[0,0]

cos = nn.CosineSimilarity(dim=0)
score = cos(image_with_caption, image_without_caption)
print(score)

Output:

tensor(1., grad_fn=<DivBackward0>)

As you see cosine similarity is coming up as 1. So once you query the model with image i.e. model(image, caption, mode='image')[0,0], it only gives embedding for image without the caption. At least thats what I observe in the above code snippet.

But sure, @LiJunnan1992 could provide more authoritative feedback in case I am missing something.

woctezuma mentioned this issue Jun 29, 2022

How to use BLIP for near-duplicate image and text pair detection? #72

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use BLIP for duplicate or near-duplicate images? #68

How to use BLIP for duplicate or near-duplicate images? #68

smith-co commented Jun 29, 2022

LiJunnan1992 commented Jun 29, 2022

woctezuma commented Jun 29, 2022

smith-co commented Jun 29, 2022 •

edited

Loading

LiJunnan1992 commented Jun 29, 2022

smith-co commented Jun 29, 2022

nashid commented Jun 29, 2022 •

edited

Loading

How to use BLIP for duplicate or near-duplicate images? #68

How to use BLIP for duplicate or near-duplicate images? #68

Comments

smith-co commented Jun 29, 2022

LiJunnan1992 commented Jun 29, 2022

woctezuma commented Jun 29, 2022

smith-co commented Jun 29, 2022 • edited Loading

LiJunnan1992 commented Jun 29, 2022

smith-co commented Jun 29, 2022

nashid commented Jun 29, 2022 • edited Loading

smith-co commented Jun 29, 2022 •

edited

Loading

nashid commented Jun 29, 2022 •

edited

Loading