Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use BLIP for duplicate or near-duplicate images? #68

Open
smith-co opened this issue Jun 29, 2022 · 6 comments
Open

How to use BLIP for duplicate or near-duplicate images? #68

smith-co opened this issue Jun 29, 2022 · 6 comments

Comments

@smith-co
Copy link

Given pair of images, my use case is to detect whether they are duplicate or not.

(imageX, imageY) = verdict/score
verdict = duplicate/not duplicate/near duplicate

How can I use BLIP for this use case?

@LiJunnan1992
Copy link
Contributor

You can compute the cosine similarity of their image embeddings

@woctezuma
Copy link

For reference, there are basic tools to find duplicates: https://github.com/idealo/imagededup

@smith-co
Copy link
Author

smith-co commented Jun 29, 2022

You can compute the cosine similarity of their image embeddings

@LiJunnan1992 do you have a example how to extract image embeddings?

I don't find any example here https://github.com/salesforce/BLIP/blob/main/demo.ipynb

@LiJunnan1992
Copy link
Contributor

Please refer to this code in the demo: image_feature = model(image, caption, mode='image')[0,0]

@smith-co
Copy link
Author

@LiJunnan1992 as I mentioned for my case is give two images, I have to detect whether they are duplicate or not.

For this I have to get the embeddings of two images and then compute the cosine similarity.

But the code sample in the demo also have caption involved:

image_feature = model(image, caption, mode='image')[0,0]

As I mention I only want to get an embedding given an image. Is it possible with this model?

@nashid
Copy link

nashid commented Jun 29, 2022

@smith-co I tested with the following:

from models.blip import blip_feature_extractor
from torch import nn
image_size = 224
image = load_demo_image(image_size=image_size, device=device)     

model_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base.pth'
    
model = blip_feature_extractor(pretrained=model_url, image_size=image_size, vit='base')
model.eval()
model = model.to(device)

caption = 'a woman sitting on the beach with a dog'

multimodal_feature = model(image, caption, mode='multimodal')[0,0]
image_feature = model(image, '', mode='image')[0,0]
text_feature = model(image, caption, mode='text')[0,0]

image_with_caption = model(image, caption, mode='image')[0,0]
image_without_caption = model(image, '', mode='image')[0,0]

cos = nn.CosineSimilarity(dim=0)
score = cos(image_with_caption, image_without_caption)
print(score)

Output:

tensor(1., grad_fn=<DivBackward0>)

As you see cosine similarity is coming up as 1. So once you query the model with image i.e. model(image, caption, mode='image')[0,0], it only gives embedding for image without the caption. At least thats what I observe in the above code snippet.

But sure, @LiJunnan1992 could provide more authoritative feedback in case I am missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants