This is the official implementation for MoCA, which focuses on the task of Textbook Question Answering.
MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering
(Pattern Recognition) [paper]
For the external corpus and some important checkpoints, please download them from here.
- Multi-stage Pretrain for text part
- Dense Layer of Text-guided Visual Attention for diagram part
Conducted on single Tesla-v100
If you find it helpful, please kindly cite the paper.
@article{xu2021moca,
title = {MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering},
author = {Xu, Fangzhi and Lin, Qika and Liu, Jun and Zhang, Lingling and Zhao, Tianzhe and Chai, Qi and Pan, Yudai},
journal = {arXiv preprint arXiv:2112.02839},
year = {2021}
}