Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduced results #82

Open
Fantasy1120 opened this issue Jun 14, 2024 · 3 comments
Open

Reproduced results #82

Fantasy1120 opened this issue Jun 14, 2024 · 3 comments

Comments

@Fantasy1120
Copy link

I try to reproduce the results under base recipe. I basically get the results in the paper on VQAv2, GQA, ScienceQA and POPE. But there is almost 1% gap on TextVQA, MMMU, and MM-Vet, and the gap on MME seems to be larger. I'm not sure if this gap is acceptable? Or what could potentially cause this gap?
reproduce

@Fantasy1120 Fantasy1120 changed the title Reproduced results and eval problems Reproduced results Jun 14, 2024
@YingHuTsing
Copy link
Collaborator

Hi, I think this is acceptable. Different number of GPUs cause different gradient_accumulation_steps. Different types of GPUs cause randomness. Btw, the performance for phi-2-siglip-base we listed here is trained by 8 A100-40Gs.

@Fantasy1120
Copy link
Author

Hi, I think this is acceptable. Different number of GPUs cause different gradient_accumulation_steps. Different types of GPUs cause randomness. Btw, the performance for phi-2-siglip-base we listed here is trained by 8 A100-40Gs.

Thanks for your reply. I see that you are using fp16 by default in the training script, but A100 supports bf16. May I ask if you are using the bf16 in your training?

@YingHuTsing
Copy link
Collaborator

No, we haven't tried bf16 thoroughly. But we encourage the open-source community to give it a try. And we can update the performance table accordingly and welcome the open-source community as the contributors of this code repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants