The Dependency:

Our code is based on the xl-sum of huggingface transformers.

The Dependency:

python==3.7.9
pytorch==1.7.1 
torchvision==0.8.2 
torchaudio==0.7.2 
cudatoolkit=10.2

Visual Features Extraction and usage

The visual features extraction code are mainly from image_feature_extraction [1,2].

The code of incorporating image features are mainly borrowed from Vg-gplms.

Data

All the triplet data <image urls, article, and summary> can be downloaded here. Note that the training data of zero-shot languages are not used under the zero-shot setting.

Traing

For multi-gpu multilingual training (8 gpus), run it like this:

bash multimodal_dist_mmt5_32_ft.sh 4 11 high-resource 1.0 8 256   # high-resource for reproducing Table 1.

For single-gpu single-language training, run it like this:

bash single_lang_multimodal_train32.sh high-resource english # e.g., for training on english dataset.

Testing

For testing, run it:

bash evaluate.sh.

Reference

[1] Jaemin Cho, Jie Lei, Hao Tan, and Mohit Bansal. Unifying vision-and-language tasks via text generation. In ICML, 2021: 1931–1942.
[2] Anderson P, He X, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]. In CVPR. 2018: 6077-6086.

Citation

@misc{https://doi.org/10.48550/arxiv.2212.07672,
  doi = {10.48550/ARXIV.2212.07672},
  url = {https://arxiv.org/abs/2212.07672},
  author = {Liang, Yunlong and Meng, Fandong and Xu, Jinan and Wang, Jiaan and Chen, Yufeng and Zhou, Jie},
  keywords = {Computer Vision and Pattern Recognition (cs.CV), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
feature_extraction		feature_extraction
images_crawl		images_crawl
multilingual_rouge_scoring		multilingual_rouge_scoring
scripts		scripts
transformers/src/transformers		transformers/src/transformers
README.md		README.md
__init__.py		__init__.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
sentence_splitter.py		sentence_splitter.py
setup.sh		setup.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Dependency:

Visual Features Extraction and usage

Data

Traing

Testing

Reference

Citation

About

Releases

Packages

Languages

XL2248/SOV-MAS

Folders and files

Latest commit

History

Repository files navigation

The Dependency:

Visual Features Extraction and usage

Data

Traing

Testing

Reference

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages