Agent with the Big Picture: Perceiving Surroundings for Interactive Instruction Following
Byeonghwi Kim,
Suvaansh Bhambri ,
Kunal Pratap Singh ,
Roozbeh Mottaghi ,
Jonghyun Choi
Embodied AI Workshop @ CVPR 2021
🏆 2nd Place 🏆
ALFRED Challenge (CVPRW'21)
ABP (Agent with the Big Picture) is an embodied instruction following agent that exploits surrounding views by additional observations from navigable directions to enlarge the field of view of the agent.
conda create -n abp python=3.6 -y
conda activate abp
pip install -r requirements.txt
You also need to install Pytorch depending on your system. e.g ) PyTorch v1.10.0 + cuda 11.1
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
Go to ABP directory (this repository)
export ALFRED_ROOT=$(pwd)
bash download_model.sh
Download "Pretrained_Models_FILM" from this link
mv Pretrained_Models_FILM/maskrcnn_alfworld/objects_lr5e-3_005.pth .
mv Pretrained_Models_FILM/maskrcnn_alfworld/receps_lr5e-3_003.pth .
Download the ResNet-18 features and annotation files from the Hugging Face repo.
git clone https://huggingface.co/datasets/byeonghwikim/abp_dataset data/json_feat_2.1.0
We provide zip files that contain raw RGB images (and depth & segmentation masks) in the Hugging Face repository, which takes about 250GB in total. With these images, you can extract features yourself with this code.
To train ABP, run train_seq2seq.py
with hyper-parameters below.
python models/train/train_seq2seq.py --data <path_to_dataset> --model seq2seq_im_mask --dout <path_to_save_weight> --splits data/splits/oct21.json --gpu --batch <batch_size> --pm_aux_loss_wt <pm_aux_loss_wt_coeff> --subgoal_aux_loss_wt <subgoal_aux_loss_wt_coeff>
Note: As mentioned in the repository of ALFRED, run with We provide the already preprocessed annotation files and the vocab file. You may not need to run the code with --preprocess
only once for preprocessed json files.--preprocess
.
Note: All hyperparameters used for the experiments in the paper are set as default.
For example, if you want train ABP and save the weights for all epochs in "exp/abp" with all hyperparameters used in the experiments in the paper, you may use the command below.
python models/train/train_seq2seq.py --dout exp/abp --gpu --save_every_epoch
Note: The option, --save_every_epoch
, saves weights for all epochs and therefore could take a lot of space.
To evaluate ABP, run eval_seq2seq.py
with hyper-parameters below.
To evaluate a model in the seen
or unseen
environment, pass valid_seen
or valid_unseen
to --eval_split
.
python models/eval/eval_seq2seq.py --data <path_to_dataset> --model models.model.seq2seq_im_mask --model_path <path_to_weight> --eval_split <eval_split> --gpu --num_threads <thread_num>
Note: All hyperparameters used for the experiments in the paper are set as default.
If you want to evaluate our pretrained model saved in exp/pretrained/pretrained.pth
in the seen
validation, you may use the command below.
python models/eval/eval_seq2seq.py --model_path "exp/pretrained/pretrained.pth" --eval_split valid_seen --gpu --num_threads 4
Export results for the leaderboard
Run
python models/eval/leaderboard.py --num_threads 4
This will create .json
file in exp/pretrained/
. Submit this to the leaderboard
Test Seen SR | Test Seen GC | Test Unseen SR | Test Unseen GC |
---|---|---|---|
54.21% (41.62) | 60.03% (48.22%) | 26.16% (16.68%) | 36.42% (25.69%) |
@inproceedings{kim2021agent,
author = {Kim, Byeonghwi and Bhambri, Suvaansh and Singh, Kunal Pratap and Mottaghi, Roozbeh and Choi, Jonghyun},
title = {Agent with the Big Picture: Perceiving Surroundings for Interactive Instruction Following},
booktitle = {Embodied AI Workshop @ CVPR 2021},
year = {2021},
}