code on text-video qa #65

cdqncn · 2022-06-14T03:05:16Z

Dear authors,

I was wondering if you could release the code on text-video qa(e.g., dataloader and how you process the videos).

Thanks!

LiJunnan1992 · 2022-06-14T04:05:07Z

Hi, you can refer to the code here for dataloading of text-video qa: https://github.com/salesforce/ALPRO. Thanks!

cdqncn · 2022-06-14T06:12:23Z

Thanks for your reply, this code is classification, I just want to learn how to generation for QA in BLIP.

LiJunnan1992 · 2022-06-14T09:08:22Z

We use the VQA model to generation answers:

BLIP/models/blip_vqa.py

Line 85 in 48211a1

question_output = self.text_encoder(question.input_ids,

To handle videos, we simply concatenate frame features and pass them to the text decoder.

BlueCat7 · 2022-08-03T13:25:14Z

@cdqncn Hi, have you reproduce the author result in zero-shot video QA. I tried to do it, but failed.

dipta007 · 2023-12-09T04:16:49Z

We use the VQA model to generation answers:

BLIP/models/blip_vqa.py

Line 85 in 48211a1

question_output = self.text_encoder(question.input_ids,

To handle videos, we simply concatenate frame features and pass them to the text decoder.

@LiJunnan1992 By concat, do you mean, after getting the image (frame) encoding, all the encodings are concatenated and then passed that raw concatenated embeddings to the decoder? Thanks for such awesome repo by the way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code on text-video qa #65

code on text-video qa #65

cdqncn commented Jun 14, 2022

LiJunnan1992 commented Jun 14, 2022

cdqncn commented Jun 14, 2022

LiJunnan1992 commented Jun 14, 2022

BlueCat7 commented Aug 3, 2022

dipta007 commented Dec 9, 2023

code on text-video qa #65

code on text-video qa #65

Comments

cdqncn commented Jun 14, 2022

LiJunnan1992 commented Jun 14, 2022

cdqncn commented Jun 14, 2022

LiJunnan1992 commented Jun 14, 2022

BlueCat7 commented Aug 3, 2022

dipta007 commented Dec 9, 2023