Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan to release finetuned models? #11

Closed
yt2639 opened this issue May 19, 2023 · 8 comments
Closed

Plan to release finetuned models? #11

yt2639 opened this issue May 19, 2023 · 8 comments

Comments

@yt2639
Copy link

yt2639 commented May 19, 2023

Hi authors,

Amazing paper and thanks for providing this nice code base. I have a question regarding the finetuned model, specifically for video-text retrieval task. Do you have plans to release those models? I do understand that we can use the pretrained VALOR as provided in the main page README (shown below)

Download Checkpoints

  • pretrained_weights (BERT,CLIP,VideoSwin). Put pretrained_weights dir under main path. (VALOR/pretrained_weights)
  • VALOR-base. Put VALOR-base under the output dir. (VALOR/output/VALOR-base)
  • VALOR-large. Put VALOR-large under the output dir. (VALOR/output/VALOR-large)

to finetune the pretrained models for down-stream tasks. But in the paper, the implementation details suggest using 8 A100 GPUs which I don't have. So I probably cannot reproduce the good results reported in the paper. Therefore, I am wondering if you plan to release the finetuned models for video-text retrieval task?

Thanks!
Shane

@kenhuang1964
Copy link

Hey, @yt2639 did you find an alternative model?

@yt2639
Copy link
Author

yt2639 commented Jun 14, 2023

Hey, @yt2639 did you find an alternative model?

No, I downloaded the pretrained weights and finetuned it myself. It seems to get similar results on 8 A5000 gpus for msrvtt dataset. But still, if authors can release the finetuned models, that will be great and very much appreciated.

@thechargedneutron
Copy link

@yt2639 Hi, what's the performance after finetuning? I am getting significantly lower scores after finetuning on 8 32GB V100 GPUs. I also faced some AssertionErrors as mentioned in #15 and I had to comment out all the assert checks in all the metrics files (BLEU, ROUGE, METEOR etc.). Did you also have to do this?

Here is the performance when I finetune

07/01/2023 02:10:55 - INFO - __main__ -   ====-evaluation--cap%tva%tv--msrvtt_cap_tva=====step 10089--==========

07/01/2023 02:10:55 - INFO - __main__ -   {'Bleu_1': 78.76, 'Bleu_2': 67.74, 'Bleu_3': 55.93, 'Bleu_4': 44.78, 'METEOR': 28.8, 'ROUGE_L': 62.59, 'CIDEr': 55.79}
07/01/2023 02:10:55 - INFO - __main__ -   ======evaluation--cap%tva%tv--msrvtt_cap_tva====history best step: 4035==

07/01/2023 02:10:55 - INFO - __main__ -   {'Bleu_1': 79.48, 'Bleu_2': 67.83, 'Bleu_3': 55.77, 'Bleu_4': 44.78, 'METEOR': 29.13, 'ROUGE_L': 62.86, 'CIDEr': 56.34}
07/01/2023 02:10:55 - INFO - __main__ -   ====-evaluation--cap%tva%tv--msrvtt_cap_tv=====step 10089--==========

07/01/2023 02:10:55 - INFO - __main__ -   {'Bleu_1': 78.14, 'Bleu_2': 66.95, 'Bleu_3': 55.26, 'Bleu_4': 44.18, 'METEOR': 28.56, 'ROUGE_L': 62.32, 'CIDEr': 55.97}
07/01/2023 02:10:55 - INFO - __main__ -   ======evaluation--cap%tva%tv--msrvtt_cap_tv====history best step: 10089==

07/01/2023 02:10:55 - INFO - __main__ -   {'Bleu_1': 78.14, 'Bleu_2': 66.95, 'Bleu_3': 55.26, 'Bleu_4': 44.18, 'METEOR': 28.56, 'ROUGE_L': 62.32, 'CIDEr': 55.97}

@yt2639
Copy link
Author

yt2639 commented Jul 2, 2023

@yt2639 Hi, what's the performance after finetuning? I am getting significantly lower scores after finetuning on 8 32GB V100 GPUs. I also faced some AssertionErrors as mentioned in #15 and I had to comment out all the assert checks in all the metrics files (BLEU, ROUGE, METEOR etc.). Did you also have to do this?

Here is the performance when I finetune

07/01/2023 02:10:55 - INFO - __main__ -   ====-evaluation--cap%tva%tv--msrvtt_cap_tva=====step 10089--==========

07/01/2023 02:10:55 - INFO - __main__ -   {'Bleu_1': 78.76, 'Bleu_2': 67.74, 'Bleu_3': 55.93, 'Bleu_4': 44.78, 'METEOR': 28.8, 'ROUGE_L': 62.59, 'CIDEr': 55.79}
07/01/2023 02:10:55 - INFO - __main__ -   ======evaluation--cap%tva%tv--msrvtt_cap_tva====history best step: 4035==

07/01/2023 02:10:55 - INFO - __main__ -   {'Bleu_1': 79.48, 'Bleu_2': 67.83, 'Bleu_3': 55.77, 'Bleu_4': 44.78, 'METEOR': 29.13, 'ROUGE_L': 62.86, 'CIDEr': 56.34}
07/01/2023 02:10:55 - INFO - __main__ -   ====-evaluation--cap%tva%tv--msrvtt_cap_tv=====step 10089--==========

07/01/2023 02:10:55 - INFO - __main__ -   {'Bleu_1': 78.14, 'Bleu_2': 66.95, 'Bleu_3': 55.26, 'Bleu_4': 44.18, 'METEOR': 28.56, 'ROUGE_L': 62.32, 'CIDEr': 55.97}
07/01/2023 02:10:55 - INFO - __main__ -   ======evaluation--cap%tva%tv--msrvtt_cap_tv====history best step: 10089==

07/01/2023 02:10:55 - INFO - __main__ -   {'Bleu_1': 78.14, 'Bleu_2': 66.95, 'Bleu_3': 55.26, 'Bleu_4': 44.18, 'METEOR': 28.56, 'ROUGE_L': 62.32, 'CIDEr': 55.97}

Hi @thechargedneutron , I didn't get the AssertionErrors. I only finetuned the video-text retrieval task on msrvtt dataset and this is the log I get:

20:17:18 - INFO - __main__ -   ====-evaluation--ret%tva%tv--msrvtt_ret_t_v=====step 9789--==========

20:17:18 - INFO - __main__ -   {'video_recall': '50.6/77.6/85.9', 'video_ravg': 71.4, 'video_medianR': 1.0, 'video_meanR': 12.203125}
20:17:18 - INFO - __main__ -   ======evaluation--ret%tva%tv--msrvtt_ret_t_v====history best step: 4894==

20:17:18 - INFO - __main__ -   {'video_recall': '53.0/77.7/86.1', 'video_ravg': 72.3, 'video_medianR': 1.0, 'video_meanR': 11.34375}
20:17:18 - INFO - __main__ -   ====-evaluation--ret%tva%tv--msrvtt_ret_t_va=====step 9789--==========

20:17:18 - INFO - __main__ -   {'video_recall': '54.5/80.8/88.0', 'video_ravg': 74.4, 'video_medianR': 1.0, 'video_meanR': 11.1171875}
20:17:18 - INFO - __main__ -   ======evaluation--ret%tva%tv--msrvtt_ret_t_va====history best step: 9789==

20:17:18 - INFO - __main__ -   {'video_recall': '54.5/80.8/88.0', 'video_ravg': 74.4, 'video_medianR': 1.0, 'video_meanR': 11.1171875}
20:19:19 - INFO - __main__ -   {'loss_ret%tva%tv--msrvtt_ret/contra_loss': 0.2164306640625, 'loss_ret%tva%tv--msrvtt_ret/total_loss': 0.2164306640625}

So I am not sure if they reported t_va number or t_v in Table. 3 in the paper. If it was t_v, then I only got 50.6 (or 53.0) for it which is lower than 54.4 as reported in Table. 3. But the t_va number is close - 54.5. So I guess maybe they reported t_va number in Table. 3?

A little bit weird thing is that I can actually put in train_batch_size = 64 in my 8 x 24GB A5000 GPUs. Not sure if this is normal as the authors reported using A100 GPUs so at first I thought I cannot use train_batch_size = 64 in my A5000 GPUs.

@thechargedneutron
Copy link

Thanks for your comments. You did not get assertionerrors since those are captioning metrics and you tried retrieval. +1 to the request to release finetuned models for the captioning tasks.

@TXH-mercury
Copy link
Owner

@yt2639 Hi, what's the performance after finetuning? I am getting significantly lower scores after finetuning on 8 32GB V100 GPUs. I also faced some AssertionErrors as mentioned in #15 and I had to comment out all the assert checks in all the metrics files (BLEU, ROUGE, METEOR etc.). Did you also have to do this?
Here is the performance when I finetune

07/01/2023 02:10:55 - INFO - __main__ -   ====-evaluation--cap%tva%tv--msrvtt_cap_tva=====step 10089--==========

07/01/2023 02:10:55 - INFO - __main__ -   {'Bleu_1': 78.76, 'Bleu_2': 67.74, 'Bleu_3': 55.93, 'Bleu_4': 44.78, 'METEOR': 28.8, 'ROUGE_L': 62.59, 'CIDEr': 55.79}
07/01/2023 02:10:55 - INFO - __main__ -   ======evaluation--cap%tva%tv--msrvtt_cap_tva====history best step: 4035==

07/01/2023 02:10:55 - INFO - __main__ -   {'Bleu_1': 79.48, 'Bleu_2': 67.83, 'Bleu_3': 55.77, 'Bleu_4': 44.78, 'METEOR': 29.13, 'ROUGE_L': 62.86, 'CIDEr': 56.34}
07/01/2023 02:10:55 - INFO - __main__ -   ====-evaluation--cap%tva%tv--msrvtt_cap_tv=====step 10089--==========

07/01/2023 02:10:55 - INFO - __main__ -   {'Bleu_1': 78.14, 'Bleu_2': 66.95, 'Bleu_3': 55.26, 'Bleu_4': 44.18, 'METEOR': 28.56, 'ROUGE_L': 62.32, 'CIDEr': 55.97}
07/01/2023 02:10:55 - INFO - __main__ -   ======evaluation--cap%tva%tv--msrvtt_cap_tv====history best step: 10089==

07/01/2023 02:10:55 - INFO - __main__ -   {'Bleu_1': 78.14, 'Bleu_2': 66.95, 'Bleu_3': 55.26, 'Bleu_4': 44.18, 'METEOR': 28.56, 'ROUGE_L': 62.32, 'CIDEr': 55.97}

Hi @thechargedneutron , I didn't get the AssertionErrors. I only finetuned the video-text retrieval task on msrvtt dataset and this is the log I get:

20:17:18 - INFO - __main__ -   ====-evaluation--ret%tva%tv--msrvtt_ret_t_v=====step 9789--==========

20:17:18 - INFO - __main__ -   {'video_recall': '50.6/77.6/85.9', 'video_ravg': 71.4, 'video_medianR': 1.0, 'video_meanR': 12.203125}
20:17:18 - INFO - __main__ -   ======evaluation--ret%tva%tv--msrvtt_ret_t_v====history best step: 4894==

20:17:18 - INFO - __main__ -   {'video_recall': '53.0/77.7/86.1', 'video_ravg': 72.3, 'video_medianR': 1.0, 'video_meanR': 11.34375}
20:17:18 - INFO - __main__ -   ====-evaluation--ret%tva%tv--msrvtt_ret_t_va=====step 9789--==========

20:17:18 - INFO - __main__ -   {'video_recall': '54.5/80.8/88.0', 'video_ravg': 74.4, 'video_medianR': 1.0, 'video_meanR': 11.1171875}
20:17:18 - INFO - __main__ -   ======evaluation--ret%tva%tv--msrvtt_ret_t_va====history best step: 9789==

20:17:18 - INFO - __main__ -   {'video_recall': '54.5/80.8/88.0', 'video_ravg': 74.4, 'video_medianR': 1.0, 'video_meanR': 11.1171875}
20:19:19 - INFO - __main__ -   {'loss_ret%tva%tv--msrvtt_ret/contra_loss': 0.2164306640625, 'loss_ret%tva%tv--msrvtt_ret/total_loss': 0.2164306640625}

So I am not sure if they reported t_va number or t_v in Table. 3 in the paper. If it was t_v, then I only got 50.6 (or 53.0) for it which is lower than 54.4 as reported in Table. 3. But the t_va number is close - 54.5. So I guess maybe they reported t_va number in Table. 3?

A little bit weird thing is that I can actually put in train_batch_size = 64 in my 8 x 24GB A5000 GPUs. Not sure if this is normal as the authors reported using A100 GPUs so at first I thought I cannot use train_batch_size = 64 in my A5000 GPUs.

T-VA metric is reported.

@TXH-mercury
Copy link
Owner

@thechargedneutron @yt2639 @kenhuangsy Hey guys, the finetuned checkpoints of VALOR-base/large on MSRVTT caption/retrieval datasets have been released now, Thanks for your attentions.

@Haawron
Copy link

Haawron commented Aug 19, 2023

Could you please share the plan to release other versions of fine-tuned models?
I am eagerly anticipating the one trained with ActivityNet-QA.

@yt2639 yt2639 closed this as completed Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants