Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine tune BLIP image retrieval for custom dataset without annotations #55

Open
poipiii opened this issue May 10, 2022 · 4 comments
Open

Comments

@poipiii
Copy link

poipiii commented May 10, 2022

hi i would like to ask hows should I approach fine-tuning BLIP for image retrieval,my dataset contains a caption and image pair with no bounding box annotations, is it possible to train BLIP without annotations or should I create a bounding box of width/height = image width/height for each image

@LiJunnan1992
Copy link
Contributor

Hi, BLIP does not require bounding box input. You can try to use the entire image as input.

@poipiii
Copy link
Author

poipiii commented May 10, 2022

Can you describe how would that work and how I should define the dataset for BLIP image retrieval fine tuning?

@LiJunnan1992
Copy link
Contributor

You can define the dataset following the same format as COCO

@poipiii
Copy link
Author

poipiii commented May 11, 2022

oh i get it so define my dataset in a JSON file with this format as defined in the coco_karpathy dataset in the following way

{
caption:"example caption for image",
image:"001.png",
image_id:"001"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants