Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird caption for a picture of flower #99

Open
phelogges opened this issue Sep 22, 2022 · 5 comments
Open

Weird caption for a picture of flower #99

phelogges opened this issue Sep 22, 2022 · 5 comments

Comments

@phelogges
Copy link

phelogges commented Sep 22, 2022

I got a weird caption on a picture of flower and don't know why:(
Hope for some advice

Model: model_base_capfilt_large.pth
sha256: 8f5187458d4d47bb87876faf3038d5947eff17475edf52cf47b62e84da0b235f

some core codes:

device = torch.device("cpu")

image_size = 224
image_path = "xxx" # say we read image by path
model = blip_decoder("checkpoints/model_base_capfilt_large.pth", image_size=image_size, vit="base")
model.eval()
model = model.to(device)

raw_image = Image.open(image_path).convert("RGB")
    transform = transforms.Compose([
        transforms.Resize((image_size, image_size), interpolation=InterpolationMode.BICUBIC),
        transforms.ToTensor(),
        transforms.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711))
    ])
image = transform(raw_image).unsqueeze(0).to(device)

with torch.no_grad():
    # beam search
    t0 = time.time()
    caption = model.generate(image, sample=False, num_beams=3, max_length=20, min_length=5)[0]
    cost = time.time() - t0
print(caption)

Output: dai dai dai dai dai dai dai dai dai dai dai dai dai dai dai dai

Here's the input image

image

@phelogges
Copy link
Author

Update
Same code and model, test other two pictures of flower, similar to picture above

Pic1: a bunch of dai dai dai dai dai dai dai dai dai dai dai dai dai
3

Pic2: a bunch of yellow flowers
2

According to pic1 caption, model just thinks dai is a kind of flower, and these flowers also called daisy:)

@LiJunnan1992
Copy link
Contributor

Thanks for posting this interesting behavior from the model, this is new to me :)

@phelogges
Copy link
Author

So any advice for improvment?
may be params in caption = model.generate(image, sample=False, num_beams=3, max_length=20, min_length=5)[0] will help

@LiJunnan1992
Copy link
Contributor

you may want to try the image captioning model finetuned on COCO

@saffie91
Copy link

nucleus sampling also doesnt do this behaviour. The way I see it is that the beam search tries to fill the min length but gets stuck on the same thing when the picture is simple and there is not much else to say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants