-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
code on text-video qa #65
Comments
Hi, you can refer to the code here for dataloading of text-video qa: https://github.com/salesforce/ALPRO. Thanks! |
Thanks for your reply, this code is classification, I just want to learn how to generation for QA in BLIP. |
We use the VQA model to generation answers: Line 85 in 48211a1
To handle videos, we simply concatenate frame features and pass them to the text decoder. |
@cdqncn Hi, have you reproduce the author result in zero-shot video QA. I tried to do it, but failed. |
@LiJunnan1992 By concat, do you mean, after getting the image (frame) encoding, all the encodings are concatenated and then passed that raw concatenated embeddings to the decoder? Thanks for such awesome repo by the way. |
Dear authors,
I was wondering if you could release the code on text-video qa(e.g., dataloader and how you process the videos).
Thanks!
The text was updated successfully, but these errors were encountered: