Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage] How can I implemet few shot learning on LLaVa #1202

Open
htluandc2 opened this issue Feb 29, 2024 · 5 comments
Open

[Usage] How can I implemet few shot learning on LLaVa #1202

htluandc2 opened this issue Feb 29, 2024 · 5 comments

Comments

@htluandc2
Copy link

Describe the issue

Hi there,

I have some images and some custom explain.
So I want to implement few shot learning to make summaries of my images.

This is my current implement:

templates = [
    {
        "url": "",
        "explain": """""",
    },
    {
        "url": "",
        "explain": """""",
    },
    {
        "url": "",
        "explain": """"""
    },
    {
        "url": ",
        "explain": """"""
    },
    {
        "url": "",
        "explain": """"""
    },
]

My code to build prompt:

from PIL import Image
import cv2
import numpy as np
import requests

"""Make image summary"""
img_prompt = "User: <image>\n"+"\nASSISTANT:"

prompt = (
    "You are an assistant tasked with summarizing images for retrieval. "
    "These summaries will be embedded and used to retrieve the raw image. "
    "Give a concise summary of the image that is well optimized for retrieval."
)
print(prompt)

images = []

for i, temp in enumerate(templates):
    image_i = Image.open(requests.get(temp['url'], stream=True).raw)
    eplain_i  = temp["explain"]
    example_i = f"\nUser: <image{i}>"+"\nASSISTANT:" + eplain_i + "\n"
    prompt += example_i
    images.append(image_i)

prompt += f"\nUser: <image{len(templates)}>"+"\nASSISTANT:"
print(prompt)
print('-'*100)
print("Examples:", len(images))

Inference:

target = Image.open("figures/figure-2-5.jpg")


out = model_multi_modals(
    images=images+[target],
    prompt=prompt,
    generate_kwargs={"max_new_tokens": 2048})

And my error:

ValueError: The input provided to the model are wrong. The number of image tokens is 0 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.
@leeyyi
Copy link

leeyyi commented Mar 1, 2024

In-context learning or fine tuning

@Nomiluks
Copy link

Nomiluks commented Mar 7, 2024

That's an excellent question. Similar to OpenAI GPT models, we can enhance them through a few-shot approach. It would be fantastic if we could apply the same method to these pre-trained models. @haotian-liu

@fisher75
Copy link

Is it solved? Because I use SGLang for batch inference and I also need this feature for ICL and multiple discussions or few shot.

@Debolena7
Copy link

Debolena7 commented Jun 10, 2024

image{len(templates)}

Describe the issue

Hi there,

I have some images and some custom explain. So I want to implement few shot learning to make summaries of my images.

This is my current implement:

templates = [
    {
        "url": "",
        "explain": """""",
    },
    {
        "url": "",
        "explain": """""",
    },
    {
        "url": "",
        "explain": """"""
    },
    {
        "url": ",
        "explain": """"""
    },
    {
        "url": "",
        "explain": """"""
    },
]

My code to build prompt:

from PIL import Image
import cv2
import numpy as np
import requests

"""Make image summary"""
img_prompt = "User: <image>\n"+"\nASSISTANT:"

prompt = (
    "You are an assistant tasked with summarizing images for retrieval. "
    "These summaries will be embedded and used to retrieve the raw image. "
    "Give a concise summary of the image that is well optimized for retrieval."
)
print(prompt)

images = []

for i, temp in enumerate(templates):
    image_i = Image.open(requests.get(temp['url'], stream=True).raw)
    eplain_i  = temp["explain"]
    example_i = f"\nUser: <image{i}>"+"\nASSISTANT:" + eplain_i + "\n"
    prompt += example_i
    images.append(image_i)

prompt += f"\nUser: <image{len(templates)}>"+"\nASSISTANT:"
print(prompt)
print('-'*100)
print("Examples:", len(images))

Inference:

target = Image.open("figures/figure-2-5.jpg")


out = model_multi_modals(
    images=images+[target],
    prompt=prompt,
    generate_kwargs={"max_new_tokens": 2048})

And my error:

ValueError: The input provided to the model are wrong. The number of image tokens is 0 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.

I think The error is because of the image token. In the prompt, the image token should be given as:

<image>

and not by image id or image index. I got a similar error in my setup for multi-prompt.

BTW, the model is not capable of performing directly on multiple images and prompts simultaneously, as is evident from the following conversations by the author and others.

https://discuss.huggingface.co/t/llava-multi-image-input-support-for-inference/68458

#197!

#57.

https://huggingface.co/YouLiXiya/tinyllava-v1.0-1.1b-hf/discussions/1#:~:text=The%20training%20is%20based%20on%20a%20single%20image.%20Multiple%20images%20are%20not%20supported

@ys-zong
Copy link

ys-zong commented Jul 14, 2024

Hi guys, you can use our implemented codebase for ICL. https://github.com/ys-zong/VL-ICL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants