-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plan to release finetuned models? #11
Comments
Hey, @yt2639 did you find an alternative model? |
No, I downloaded the pretrained weights and finetuned it myself. It seems to get similar results on 8 A5000 gpus for msrvtt dataset. But still, if authors can release the finetuned models, that will be great and very much appreciated. |
@yt2639 Hi, what's the performance after finetuning? I am getting significantly lower scores after finetuning on 8 32GB V100 GPUs. I also faced some AssertionErrors as mentioned in #15 and I had to comment out all the assert checks in all the metrics files (BLEU, ROUGE, METEOR etc.). Did you also have to do this? Here is the performance when I finetune
|
Hi @thechargedneutron , I didn't get the AssertionErrors. I only finetuned the video-text retrieval task on msrvtt dataset and this is the log I get:
So I am not sure if they reported A little bit weird thing is that I can actually put in |
Thanks for your comments. You did not get assertionerrors since those are captioning metrics and you tried retrieval. +1 to the request to release finetuned models for the captioning tasks. |
T-VA metric is reported. |
@thechargedneutron @yt2639 @kenhuangsy Hey guys, the finetuned checkpoints of VALOR-base/large on MSRVTT caption/retrieval datasets have been released now, Thanks for your attentions. |
Could you please share the plan to release other versions of fine-tuned models? |
Hi authors,
Amazing paper and thanks for providing this nice code base. I have a question regarding the finetuned model, specifically for video-text retrieval task. Do you have plans to release those models? I do understand that we can use the pretrained VALOR as provided in the main page README (shown below)
Download Checkpoints
to finetune the pretrained models for down-stream tasks. But in the paper, the implementation details suggest using 8 A100 GPUs which I don't have. So I probably cannot reproduce the good results reported in the paper. Therefore, I am wondering if you plan to release the finetuned models for video-text retrieval task?
Thanks!
Shane
The text was updated successfully, but these errors were encountered: