[Refactor] Refactor the Generate Interface #140

kennymckormick · 2024-04-03T06:23:55Z

After refactoring, all VLMs (no matter API ones or OpenSource ones), will follow the following input format:

A list of dictionary: A multi-modal message is represented by a list of dictionary, each dictionary has two keys: type and value:

type: We currently support two types, choices are ["image", "text"].
value: When type=='text' , the value is the text message (a single string); when type=='image', the value can be the local path of an image file, or the image URL.

Example:

IMAGE_PTH = 'assets/apple.jpg'
IMAGE_URL = 'https://raw.githubusercontent.com/open-compass/VLMEvalKit/main/assets/apple.jpg'
msg1 = [
    dict(type='image', value=IMAGE_PTH),
    dict(type='text', value='What is in this image?')
]
msg2 = [
    dict(type='image', value=IMAGE_URL),
    dict(type='image', value=IMAGE_URL),
    dict(type='text', value='How many apples are there in these images?')
]
response = model.generate(msg1)

For convenience sake, we also support to take a list of string as inputs. In that case, we will check if a string is an image path or image URL and automatically convert it to the list[dict] format.

Example:

IMAGE_PTH = 'assets/apple.jpg'
IMAGE_URL = 'https://raw.githubusercontent.com/open-compass/VLMEvalKit/main/assets/apple.jpg'
msg1 = [IMAGE_PTH, 'What is in this image?']
msg2 = [IMAGE_URL, IMAGE_URL,  'How many apples are there in these images?']
response = model.generate(msg1)

Though the input format accepts multiple images in a multi-modal message, sometimes the VLM does not support that feature (for example, some VLMs only accept a single image). In such case, only the first image will be processed by the VLM, and there will be a warning message.
Another advantage of the new format is, it can be compatible with potential future updates. For examples, in the future, new types can be added to be compatible with audio or signals from custom sensors.

kennymckormick added 30 commits April 2, 2024 21:12

refactor API done

bd20c15

Implement a base class for opensource VLMs

654a82f

Merge branch 'main' of github.com:open-compass/VLMEvalKit into interface

81462dd

reorg to resolve circular import

979daa0

fix

9ad0480

Merge branch 'main' of github.com:open-compass/VLMEvalKit into interface

29141aa

update get_token_len

6c91e04

Fix GeminiV

fb77f55

fix url issues

ccc482f

update

75603cb

update the logic of API kwargs and add docstring for base API class

90f85a2

set the API var name to message, modify the inference func

fec87aa

update the baseclass of opensource VLM

6038d35

disable prefetch

8accc77

update

b975e4a

update

66fe149

update CogVLM

2624249

refactor OpenSource Models

9583ba3

fix Gemini

d5f9e73

update

5d19005

update

264f67e

Write a script to check the availability of VLMs

d1352e4

update check_VLM

8bd7cda

update

b945da9

fix bug in llava

3efca5a

Half sharecaptioner

2b06c79

update xcomposer

e907af8

update check_VLM

5a2f943

update xcomposer

e9090ff

add sharecaptioner back

ef7ea4b

kennymckormick added 17 commits April 7, 2024 20:22

fix

a84b631

fix name

829d04c

fix

037f576

fix

b038279

update

2ffe7d8

update

830f97b

update

c15247c

fix cogvlm

5c307da

update cogvlm

03a1071

update mplug_owl2

d4d7b4b

update

b2ed5fc

update

f3975cb

fix open_flamingo

199658b

update transcore_m

1456b9e

update

71bc40d

update

b1f6e25

Remove EMU2

c248927

kennymckormick changed the title ~~[WIP] Refactor the Generate Interface~~ [Refactor] Refactor the Generate Interface Apr 8, 2024

kennymckormick added 3 commits April 8, 2024 15:05

update

2239e72

Merge branch 'main' of github.com:open-compass/VLMEvalKit into interface

36979a6

update README

97f6566

kennymckormick merged commit ea0ff61 into main Apr 9, 2024
2 checks passed

This was referenced Apr 15, 2024

IndexError: index 1 is out of bounds for dimension 0 with size 1 #150

Closed

How are models that use in-context examples handled? #90

Open

kennymckormick deleted the interface branch April 16, 2024 07:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Refactor the Generate Interface #140

[Refactor] Refactor the Generate Interface #140

kennymckormick commented Apr 3, 2024 •

edited

Loading

[Refactor] Refactor the Generate Interface #140

[Refactor] Refactor the Generate Interface #140

Conversation

kennymckormick commented Apr 3, 2024 • edited Loading

kennymckormick commented Apr 3, 2024 •

edited

Loading