You mean phi3 surpassed mistral7B? #1

MonolithFoundation · 2024-06-14T09:25:39Z

I think it really really out of expect, how will a phi3 model surpass mistaral7B, in the case of VideChat2 using a gaint vision encoder?
Which part could be really work one?

mmaaz60 · 2024-06-14T14:30:39Z

Hi @MonolithFoundation,

I appreciate your interest in our work. As per VideoChat2 paper, they have reported an average of 60.4 on MVBench with Mistral-7B LLM. In our case, VideoGPT+ obtains 58.7 average score on MVBench with Phi-3-mini-3.8B LLM.

We have released all the model checkpoints, training, and evaluation codes to reproduce our reported results. I hope this will help.

Please let me know if you have any questions. Thank You.

zimenglan-sysu-512 · 2024-06-15T14:08:05Z

can eight v100 GPUs train the model?

lucasjinreal · 2024-06-16T06:37:41Z

@mmaaz60 From the first picture, VideoGpT+ surpassed VideChat2 with a clearly margin, but VideChat2 with mistral actually got better result as for now.

Current days Video MLLMs actually didn't really care about which LLM size they using...

mmaaz60 · 2024-06-16T06:42:46Z

can eight v100 GPUs train the model?

Hi @zimenglan-sysu-512

I appreciate your interest in our work. As we are using Phi-3-Mini with 3.8B model as LLM, the model can be trained easily on 8 V100 GPUs with 32GB memory per GPU. However, we have to turn off the flash attention as it is not supported for V100 GPUs.

I hope it will help. Good Luck! And please let me know if you face any issues.

mmaaz60 · 2024-06-16T07:14:10Z

@mmaaz60 From the first picture, VideoGpT+ surpassed VideChat2 with a clearly margin, but VideChat2 with mistral actually got better result as for now.

Current days Video MLLMs actually didn't really care about which LLM size they using...

Hi @lucasjinreal

Thank you for your interest in our work. VideoGPT+ is using Phi-3-mini LLM with only 3.8B parameters, and is relatively weaker as compared to Mistral-7B.

On the other hand, if we compare the Vicuna 7B based models for both VideoGPT+ and VideoChat2, we noticed that VideoChat2 obtains 51.1 on average on MVBench, and our Vicuna 7B based variant obtains 53.1 average score.

Further, there are gains in VCGBench and VCGBench-Diverse evaluations as well.

We acknowledge that VideoChat2 is a strong video conversation model, however, our VideoGPT+ obtains better results on multiple benchmarks as discussed in our technical report and all the codes to reproduce our reported results are released on the GitHub.

mmaaz60 added question Further information is requested results labels Jun 16, 2024

mmaaz60 closed this as completed Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

You mean phi3 surpassed mistral7B? #1

You mean phi3 surpassed mistral7B? #1

MonolithFoundation commented Jun 14, 2024

mmaaz60 commented Jun 14, 2024

zimenglan-sysu-512 commented Jun 15, 2024

lucasjinreal commented Jun 16, 2024

mmaaz60 commented Jun 16, 2024

mmaaz60 commented Jun 16, 2024

You mean phi3 surpassed mistral7B? #1

You mean phi3 surpassed mistral7B? #1

Comments

MonolithFoundation commented Jun 14, 2024

mmaaz60 commented Jun 14, 2024

zimenglan-sysu-512 commented Jun 15, 2024

lucasjinreal commented Jun 16, 2024

mmaaz60 commented Jun 16, 2024

mmaaz60 commented Jun 16, 2024