Developed by Freda Shi and Jiayuan Mao.
This repo includes the implementation of our paper "Learning Visually-Grounded Semantics from Contrastive Adversarial Samples" at COLING 2018.
- Python3
- PyTorch 0.3.0
- NLTK
- spacy
- word2number
- TensorBoard
- NumPy
- Jacinle (Jiayuan's personal toolbox, required by two evaluation experiments)
We apply VSE++ (Faghri et al., 2017) as our base model. To reproduce the baseline numbers of VSE++, please follow the instructions here by the authors. We found that their results are easy to reproduce!
We use the same datasets as VSE++ (Faghri et al., 2017). Use the following commands to download the data of VSE to the root folder
wget https://www.cs.toronto.edu/~faghri/vsepp/vocab.tar
wget https://www.cs.toronto.edu/~faghri/vsepp/data.tar
and unzip the tars with
tar -xvf vocab.tar
tar -xvf data.tar
You may also need GloVe, as we apply GloVe.840B.300d as the initialization of the word embeddings. We also provide a custom subset of GloVe embeddings at VSE_C/data/glove.pkl
.
The following commands generates specific types of contrastive adversarial samples of sentences. Note that the script will create folders in the initial data path, e.g., ../data/coco_precomp/noun_ex/
.
cd adversarial_attack
python3 $TYPE.py --data_path $DATA_PATH --data_name $DATA_NAME
$TYPE
can be one of noun
, numeral
or relation
. Here is an example command:
python3 noun.py --data_path ../data --data_name coco_precomp
Similar to VSE++, VSE-C supports training with contrastive adversarial samples in the text domain. After obtaining the contrastive adversarial samples, we can train an example noun-typed VSE-C with the following command (after generating noun-typed contrastive adversarial samples):
cd VSE_C
python3 train.py --data_path ../data/ --data_name coco_precomp \
--logger_name runs/coco_noun --learning_rate 0.001 --text_encoder_type gru \
--max_violation --worker 10 --img_dim 2048 --use_external_captions
The model will be saved into the logger folder, e.g., runs/resnet152_noun
.
Please refer to VSE_C/train.py
for more detailed description on hyper-parameters. Note that you also need to create
We have tested the model on GPU (CUDA 8.0). If you have any problem on training VSE-C on a different environment, please feel free to make an issue.
Here shows an example of the in-domain evaluation. Please run the code in Python3 or IPython3.
from VSE_C.vocab import Vocabulary
import VSE_C.evaluation
evaluation.eval_with_single_extended('runs/coco_noun', 'data/', 'coco_precomp', 'test')
We provide our training script (testing is associated in the evaluation procedure) at evaluation/object_alignment
. Please refer to the script for detailed usage.
evaluation/saliency_visualization/saliency_visualization.py
provides the script for saliency visualization. Please refer to the script for detailed usage. The visualized saliency images will be like:
First, we need to generate datasets for sentence completion by
cd evaluations/completion
python3 completion_datamaker.py --input $INPUT_PATH --output $OUTPUT_PATH
Then, run
python3 -m evaluations.completion.completion_train $ARGS
python3 -m evaluations.completion.completion_test $ARGS
for training and testing sentence completion models. Please refer to evaluation scripts for further descriptions of arguments.
If you find VSE-C useful, please consider citing:
@inproceedings{shi2018learning,
title={Learning Visually-Grounded Semantics from Contrastive Adversarial Samples},
author={Shi, Haoyue and Mao, Jiayuan and Xiao, Tete and Jiang, Yuning and Sun, Jian},
booktitle={Proceedings of the 27th International Conference on Computational Linguistics},
year={2018}
}