Skip to content

Commit

Permalink
init
Browse files Browse the repository at this point in the history
  • Loading branch information
TXH-mercury committed Jan 23, 2023
0 parents commit cd7f98f
Show file tree
Hide file tree
Showing 118 changed files with 75,983 additions and 0 deletions.
118 changes: 118 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# ctags
tags

aic_caption/
cococaption/pycocoevalcap/spice
output/
ouptut/
datasets/
lpips/
__pycache__
UVG/evaluation
gen_evaluation
pretrained_weights
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/

274 changes: 274 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,274 @@
# OPT: UNiversal Image-TExt Representation Learning
This is the official repository of [OPT](https://arxiv.org/abs/1909.11740) (ECCV 2020).
This repository currently supports finetuning OPT on
[NLVR2](http:https://lil.nlp.cornell.edu/nlvr/), [VQA](https://visualqa.org/), [VCR](https://visualcommonsense.com/),
[SNLI-VE](https://github.com/necla-ml/SNLI-VE),
Image-Text Retrieval for [COCO](https://cocodataset.org/#home) and
[Flickr30k](http:https://shannon.cs.illinois.edu/DenotationGraph/), and
[Referring Expression Comprehensions](https://github.com/lichengunc/refer) (RefCOCO, RefCOCO+, and RefCOCO-g).
Both OPT-base and OPT-large pre-trained checkpoints are released.
OPT-base pre-training with in-domain data is also available.

![Overview of OPT](https://acvrpublicycchen.blob.core.windows.net/opt/opt_overview_v2.png)

Some code in this repo are copied/modified from opensource implementations made available by
[PyTorch](https://github.com/pytorch/pytorch),
[HuggingFace](https://github.com/huggingface/transformers),
[OpenNMT](https://github.com/OpenNMT/OpenNMT-py),
and [Nvidia](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch).
The image features are extracted using [BUTD](https://github.com/peteanderson80/bottom-up-attention).


## Requirements
We provide Docker image for easier reproduction. Please install the following:
- [nvidia driver](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-installation) (418+),
- [Docker](https://docs.docker.com/install/linux/docker-ce/ubuntu/) (19.03+),
- [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-docker#quickstart).

Our scripts require the user to have the [docker group membership](https://docs.docker.com/install/linux/linux-postinstall/)
so that docker commands can be run without sudo.
We only support Linux with NVIDIA GPUs. We test on Ubuntu 18.04 and V100 cards.
We use mixed-precision training hence GPUs with Tensor Cores are recommended.

## Quick Start
*NOTE*: Please run `bash scripts/download_pretrained.sh $PATH_TO_STORAGE` to get our latest pretrained
checkpoints. This will download both the base and large models.

We use NLVR2 as an end-to-end example for using this code base.

1. Download processed data and pretrained models with the following command.
```bash
bash scripts/download_nlvr2.sh $PATH_TO_STORAGE
```
After downloading you should see the following folder structure:
```
├── ann
│   ├── dev.json
│   └── test1.json
├── finetune
│   ├── nlvr-base
│   └── nlvr-base.tar
├── img_db
│   ├── nlvr2_dev
│   ├── nlvr2_dev.tar
│   ├── nlvr2_test
│   ├── nlvr2_test.tar
│   ├── nlvr2_train
│   └── nlvr2_train.tar
├── pretrained
│   └── opt-base.pt
└── txt_mapper
   ├── nlvr2_dev.db
   ├── nlvr2_dev.db.tar
   ├── nlvr2_test1.db
   ├── nlvr2_test1.db.tar
   ├── nlvr2_train.db
   └── nlvr2_train.db.tar
```

2. Launch the Docker container for running the experiments.
```bash
# docker image should be automatically pulled
source launch_container.sh $PATH_TO_STORAGE/txt_mapper $PATH_TO_STORAGE/img_db \
$PATH_TO_STORAGE/finetune $PATH_TO_STORAGE/pretrained
```
The launch script respects $CUDA_VISIBLE_DEVICES environment variable.
Note that the source code is mounted into the container under `/src` instead
of built into the image so that user modification will be reflected without
re-building the image. (Data folders are mounted into the container separately
for flexibility on folder structures.)


3. Run finetuning for the NLVR2 task.
```bash
# inside the container
python train_nlvr2.py --config config/train-nlvr2-base-1gpu.json

# for more customization
horovodrun -np $N_GPU python train_nlvr2.py --config $YOUR_CONFIG_JSON
```

4. Run inference for the NLVR2 task and then evaluate.
```bash
# inference
python inf_nlvr2.py --txt_mapper /txt/nlvr2_test1.db/ --img_db /img/nlvr2_test/ \
--train_dir /storage/nlvr-base/ --ckpt 6500 --output_dir . --fp16

# evaluation
# run this command outside docker (tested with python 3.6)
# or copy the annotation json into mounted folder
python scripts/eval_nlvr2.py ./results.csv $PATH_TO_STORAGE/ann/test1.json
```
The above command runs inference on the model we trained. Feel free to replace
`--train_dir` and `--ckpt` with your own model trained in step 3.
Currently we only support single GPU inference.


5. Customization
```bash
# training options
python train_nlvr2.py --help
```
- command-line argument overwrites JSON config files
- JSON config overwrites `argparse` default value.
- use horovodrun to run multi-GPU training
- `--gradient_accumulation_steps` emulates multi-gpu training


6. Misc.
```bash
# text annotation preprocessing
bash scripts/create_txtdb.sh $PATH_TO_STORAGE/txt_mapper $PATH_TO_STORAGE/ann

# image feature extraction (Tested on Titan-Xp; may not run on latest GPUs)
bash scripts/extract_imgfeat.sh $PATH_TO_IMG_FOLDER $PATH_TO_IMG_NPY

# image preprocessing
bash scripts/create_imgdb.sh $PATH_TO_IMG_NPY $PATH_TO_STORAGE/img_db
```
In case you would like to reproduce the whole preprocessing pipeline.

## Downstream Tasks Finetuning

### VQA
NOTE: train and inference should be ran inside the docker container
1. download data
```
bash scripts/download_vqa.sh $PATH_TO_STORAGE
```
2. train
```
horovodrun -np 4 python train_vqa.py --config config/train-vqa-base-4gpu.json \
--output_dir $VQA_EXP
```
3. inference
```
python inf_vqa.py --txt_mapper /txt/vqa_test.db --img_db /img/coco_test2015 \
--output_dir $VQA_EXP --checkpoint 6000 --pin_mem --fp16
```
The result file will be written at `$VQA_EXP/results_test/results_6000_all.json`, which can be
submitted to the evaluation server

### VCR
NOTE: train and inference should be ran inside the docker container
1. download data
```
bash scripts/download_vcr.sh $PATH_TO_STORAGE
```
2. train
```
horovodrun -np 4 python train_vcr.py --config config/train-vcr-base-4gpu.json \
--output_dir $VCR_EXP
```
3. inference
```
horovodrun -np 4 python inf_vcr.py --txt_mapper /txt/vcr_test.db \
--img_db "/img/vcr_gt_test/;/img/vcr_test/" \
--split test --output_dir $VCR_EXP --checkpoint 8000 \
--pin_mem --fp16
```
The result file will be written at `$VCR_EXP/results_test/results_8000_all.csv`, which can be
submitted to VCR leaderboard for evluation.

### VCR 2nd Stage Pre-training
NOTE: pretrain should be ran inside the docker container
1. download VCR data if you haven't
```
bash scripts/download_vcr.sh $PATH_TO_STORAGE
```
2. 2nd stage pre-train
```
horovodrun -np 4 python pretrain_vcr.py --config config/pretrain-vcr-base-4gpu.json \
--output_dir $PRETRAIN_VCR_EXP
```

### Visual Entailment (SNLI-VE)
NOTE: train should be ran inside the docker container
1. download data
```
bash scripts/download_ve.sh $PATH_TO_STORAGE
```
2. train
```
horovodrun -np 2 python train_ve.py --config config/train-ve-base-2gpu.json \
--output_dir $VE_EXP
```

### Image-Text Retrieval
download data
```
bash scripts/download_itm.sh $PATH_TO_STORAGE
```
NOTE: Image-Text Retrieval is computationally heavy, especially on COCO.
#### Zero-shot Image-Text Retrieval (Flickr30k)
```
# every image-text pair has to be ranked; please use as many GPUs as possible
horovodrun -np $NGPU python inf_itm.py \
--txt_mapper /txt/itm_flickr30k_test.db --img_db /img/flickr30k \
--checkpoint /pretrain/opt-base.pt --model_cfg /src/config/opt-base.json \
--output_dir $ZS_ITM_RESULT --fp16 --pin_mem
```
#### Image-Text Retrieval (Flickr30k)
- normal finetune
```
horovodrun -np 8 python train_itm.py --config config/train-itm-flickr-base-8gpu.json
```
- finetune with hard negatives
```
horovodrun -np 16 python train_itm_hard_negatives.py \
--config config/train-itm-flickr-base-16gpu-hn.jgon
```
#### Image-Text Retrieval (COCO)
- finetune with hard negatives
```
horovodrun -np 16 python train_itm_hard_negatives.py \
--config config/train-itm-coco-base-16gpu-hn.json
```
### Referring Expressions
1. download data
```
bash scripts/download_re.sh $PATH_TO_STORAGE
```
2. train
```
python train_re.py --config config/train-refcoco-base-1gpu.json \
--output_dir $RE_EXP
```
3. inference and evaluation
```
source scripts/eval_refcoco.sh $RE_EXP
```
The result files will be written under `$RE_EXP/results_test/`

Similarly, change corresponding configs/scripts for running RefCOCO+/RefCOCOg.


## Pre-tranining
download
```
bash scripts/download_indomain.sh $PATH_TO_STORAGE
```
pre-train
```
horovodrun -np 8 python pretrain.py --config config/pretrain-indomain-base-8gpu.json \
--output_dir $PRETRAIN_EXP
```
Unfortunately, we cannot host CC/SBU features due to their large size. Users will need to process
them on their own. We will provide a smaller sample for easier reference to the expected format soon.


## Citation

If you find this code useful for your research, please consider citing:
```
@inproceedings{chen2020opt,
title={OPT: Universal image-text representation learning},
author={Chen, Yen-Chun and Li, Linjie and Yu, Licheng and Kholy, Ahmed El and Ahmed, Faisal and Gan, Zhe and Cheng, Yu and Liu, Jingjing},
booktitle={ECCV},
year={2020}
}
```

## License

MIT
Loading

0 comments on commit cd7f98f

Please sign in to comment.