[Usage] How can I implemet few shot learning on LLaVa #1202

htluandc2 · 2024-02-29T09:30:09Z

Describe the issue

Hi there,

I have some images and some custom explain.
So I want to implement few shot learning to make summaries of my images.

This is my current implement:

templates = [
    {
        "url": "",
        "explain": """""",
    },
    {
        "url": "",
        "explain": """""",
    },
    {
        "url": "",
        "explain": """"""
    },
    {
        "url": ",
        "explain": """"""
    },
    {
        "url": "",
        "explain": """"""
    },
]

My code to build prompt:

from PIL import Image
import cv2
import numpy as np
import requests

"""Make image summary"""
img_prompt = "User: <image>\n"+"\nASSISTANT:"

prompt = (
    "You are an assistant tasked with summarizing images for retrieval. "
    "These summaries will be embedded and used to retrieve the raw image. "
    "Give a concise summary of the image that is well optimized for retrieval."
)
print(prompt)

images = []

for i, temp in enumerate(templates):
    image_i = Image.open(requests.get(temp['url'], stream=True).raw)
    eplain_i  = temp["explain"]
    example_i = f"\nUser: <image{i}>"+"\nASSISTANT:" + eplain_i + "\n"
    prompt += example_i
    images.append(image_i)

prompt += f"\nUser: <image{len(templates)}>"+"\nASSISTANT:"
print(prompt)
print('-'*100)
print("Examples:", len(images))

Inference:

target = Image.open("figures/figure-2-5.jpg")


out = model_multi_modals(
    images=images+[target],
    prompt=prompt,
    generate_kwargs={"max_new_tokens": 2048})

And my error:

ValueError: The input provided to the model are wrong. The number of image tokens is 0 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.

The text was updated successfully, but these errors were encountered:

leeyyi · 2024-03-01T08:50:11Z

In-context learning or fine tuning

Nomiluks · 2024-03-07T13:40:57Z

That's an excellent question. Similar to OpenAI GPT models, we can enhance them through a few-shot approach. It would be fantastic if we could apply the same method to these pre-trained models. @haotian-liu

fisher75 · 2024-03-25T10:17:55Z

Is it solved? Because I use SGLang for batch inference and I also need this feature for ICL and multiple discussions or few shot.

Debolena7 · 2024-06-10T12:46:17Z

image{len(templates)}

Describe the issue

Hi there,

I have some images and some custom explain. So I want to implement few shot learning to make summaries of my images.

This is my current implement:

templates = [
    {
        "url": "",
        "explain": """""",
    },
    {
        "url": "",
        "explain": """""",
    },
    {
        "url": "",
        "explain": """"""
    },
    {
        "url": ",
        "explain": """"""
    },
    {
        "url": "",
        "explain": """"""
    },
]

My code to build prompt:

from PIL import Image
import cv2
import numpy as np
import requests

"""Make image summary"""
img_prompt = "User: <image>\n"+"\nASSISTANT:"

prompt = (
    "You are an assistant tasked with summarizing images for retrieval. "
    "These summaries will be embedded and used to retrieve the raw image. "
    "Give a concise summary of the image that is well optimized for retrieval."
)
print(prompt)

images = []

for i, temp in enumerate(templates):
    image_i = Image.open(requests.get(temp['url'], stream=True).raw)
    eplain_i  = temp["explain"]
    example_i = f"\nUser: <image{i}>"+"\nASSISTANT:" + eplain_i + "\n"
    prompt += example_i
    images.append(image_i)

prompt += f"\nUser: <image{len(templates)}>"+"\nASSISTANT:"
print(prompt)
print('-'*100)
print("Examples:", len(images))

Inference:

target = Image.open("figures/figure-2-5.jpg")


out = model_multi_modals(
    images=images+[target],
    prompt=prompt,
    generate_kwargs={"max_new_tokens": 2048})

And my error:

ValueError: The input provided to the model are wrong. The number of image tokens is 0 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.

I think The error is because of the image token. In the prompt, the image token should be given as:

<image>

and not by image id or image index. I got a similar error in my setup for multi-prompt.

BTW, the model is not capable of performing directly on multiple images and prompts simultaneously, as is evident from the following conversations by the author and others.

https://discuss.huggingface.co/t/llava-multi-image-input-support-for-inference/68458

#197!

#57.

https://huggingface.co/YouLiXiya/tinyllava-v1.0-1.1b-hf/discussions/1#:~:text=The%20training%20is%20based%20on%20a%20single%20image.%20Multiple%20images%20are%20not%20supported

ys-zong · 2024-07-14T14:51:15Z

Hi guys, you can use our implemented codebase for ICL. https://github.com/ys-zong/VL-ICL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage] How can I implemet few shot learning on LLaVa #1202

[Usage] How can I implemet few shot learning on LLaVa #1202

htluandc2 commented Feb 29, 2024

leeyyi commented Mar 1, 2024

Nomiluks commented Mar 7, 2024 •

edited

Loading

fisher75 commented Mar 25, 2024

Debolena7 commented Jun 10, 2024 •

edited

Loading

Describe the issue

ys-zong commented Jul 14, 2024

[Usage] How can I implemet few shot learning on LLaVa #1202

[Usage] How can I implemet few shot learning on LLaVa #1202

Comments

htluandc2 commented Feb 29, 2024

Describe the issue

leeyyi commented Mar 1, 2024

Nomiluks commented Mar 7, 2024 • edited Loading

fisher75 commented Mar 25, 2024

Debolena7 commented Jun 10, 2024 • edited Loading

Describe the issue

ys-zong commented Jul 14, 2024

Nomiluks commented Mar 7, 2024 •

edited

Loading

Debolena7 commented Jun 10, 2024 •

edited

Loading