[Discussion/Feedback] VLM + Multimodal benchmarking #1155

haileyschoelkopf · 2023-12-18T14:59:25Z

This is an issue to discuss the feasibility / desirability of multimodal benchmarks to be included and supported in lm-eval. With the increase in multimodal (V)LMs and in benchmarks like MMMU designed to test these models, it's worth discussing either whether lm-eval is easily extensible to these tests or whether we wish to have these in scope for the library.

ashvinnihalani · 2024-04-02T18:06:31Z

There is already a fork of lm-evalaution-harness built by the LLaVa team called lmms-eval that is focused on VLM evaluation that can serve a PoC. For what is worth, I already have a private fork of lm-eval that has added LLaVa support and is working with MMMU and the LLava 1.5 7B model through the original codebase. I think the main considerations of an implementation are the following:

How tightly do we want to couple VLM and MM benchmarks? Should VLM only be able to run MM benchmarks should we enable both LM and VLM to run MM benchmarks. Do we need to ensure the VLM can run text -only benchmarks
Are VLM first party citizens? Do we need to focus on optimizations for these? For example VLLM just introduced VLM support a couple of days ago. Do we need to pull the latest benchmarks
Significant effort needs to be invested in optimization of the VLMM benchmarks as image data/video data in particle is very big, at what point does it becomes a problem

If the project owners are aligned on these questions, I can clean up and submit a PR for my LLaVa implementation.

ashvinnihalani · 2024-05-12T03:07:51Z

Wanted to follow up

ashvinnihalani · 2024-05-13T07:17:59Z

Just wanted to give a heads up that I started the PR.

haileyschoelkopf added the opinions wanted For discussing open questions. label Dec 18, 2023

LSinev mentioned this issue Mar 24, 2024

Image Dataset #1626

Closed

haileyschoelkopf self-assigned this May 12, 2024

ashvinnihalani mentioned this issue May 13, 2024

Adding LLaVa support #1832

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion/Feedback] VLM + Multimodal benchmarking #1155

[Discussion/Feedback] VLM + Multimodal benchmarking #1155

haileyschoelkopf commented Dec 18, 2023

ashvinnihalani commented Apr 2, 2024

ashvinnihalani commented May 12, 2024

ashvinnihalani commented May 13, 2024

[Discussion/Feedback] VLM + Multimodal benchmarking #1155

[Discussion/Feedback] VLM + Multimodal benchmarking #1155

Comments

haileyschoelkopf commented Dec 18, 2023

ashvinnihalani commented Apr 2, 2024

ashvinnihalani commented May 12, 2024

ashvinnihalani commented May 13, 2024