Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code on text-video qa #65

Open
cdqncn opened this issue Jun 14, 2022 · 5 comments
Open

code on text-video qa #65

cdqncn opened this issue Jun 14, 2022 · 5 comments

Comments

@cdqncn
Copy link

cdqncn commented Jun 14, 2022

Dear authors,

I was wondering if you could release the code on text-video qa(e.g., dataloader and how you process the videos).

Thanks!

@LiJunnan1992
Copy link
Contributor

Hi, you can refer to the code here for dataloading of text-video qa: https://github.com/salesforce/ALPRO. Thanks!

@cdqncn
Copy link
Author

cdqncn commented Jun 14, 2022

Thanks for your reply, this code is classification, I just want to learn how to generation for QA in BLIP.

@LiJunnan1992
Copy link
Contributor

We use the VQA model to generation answers:

question_output = self.text_encoder(question.input_ids,

To handle videos, we simply concatenate frame features and pass them to the text decoder.

@BlueCat7
Copy link

BlueCat7 commented Aug 3, 2022

@cdqncn Hi, have you reproduce the author result in zero-shot video QA. I tried to do it, but failed.

@dipta007
Copy link

dipta007 commented Dec 9, 2023

We use the VQA model to generation answers:

question_output = self.text_encoder(question.input_ids,

To handle videos, we simply concatenate frame features and pass them to the text decoder.

@LiJunnan1992 By concat, do you mean, after getting the image (frame) encoding, all the encodings are concatenated and then passed that raw concatenated embeddings to the decoder? Thanks for such awesome repo by the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants