Skip to content
forked from fpv-iplab/rulstm

Code for the Paper: Antonino Furnari and Giovanni Maria Farinella. What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention. International Conference on Computer Vision, 2019.

Notifications You must be signed in to change notification settings

zhoumumu/rulstm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 

Repository files navigation

What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention

This repository hosts the code related to the following papers:

Antonino Furnari and Giovanni Maria Farinella, Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). 2020. Download

Antonino Furnari and Giovanni Maria Farinella, What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention. International Conference on Computer Vision, 2019. Download

Please also see the project web page at https://iplab.dmi.unict.it/rulstm.

If you use the code/models hosted in this repository, please cite the following papers:

@article{furnari2020rulstm,
  author = {Antonino Furnari and Giovanni Maria Farinella},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)},
  title = {Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video},
  year = {2020}
}
@inproceedings{furnari2019rulstm, 
  title = { What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention. }, 
  author = { Antonino Furnari and Giovanni Maria Farinella },
  year = { 2019 },
  booktitle = { International Conference on Computer Vision (ICCV) },
}

Updates:

  • 28/06/2021 We are now providing object detections on all frames of EPIC-KITCHENS-100. Please see this README (below) for more information;
  • 11/01/2021 We have updated the archive providing the EGTEA Gaze+ pre-extracted features. Please see this README (below) for more information;
  • 01/10/2020 We are now sharing the rgb/flow/obj EPIC-KITCHENS-100 features and pre-trained models used to report baseline results in the Rescaling Egocentric Vision paper;
  • 04/05/2020 We have now published an extended version of this work on PAMI. Please check the text above for the updated references;
  • 23/03/2020 We are now providing pre-extracted features for EGTEA Gaze+. See README for more information;
  • 11/10/2019 We are now also providing TSN and object-based features extracted for each frame of EPIC-KITCHENS. They can be downloaded using the download_data_full.sh script rather than download_data.sh;
  • 23/10/2019 Added some scripts to show how to extract features from videos. The scripts can be found under FEATEXT and are documented in this README.

Overview

This repository provides the following components:

  • The official PyTorch implementation of the proposed Rolling-Unrolling LSTM approach, including Sequence-Completion Pre-Training and Modality ATTention (MATT);
  • A program to train, validate and test the proposed method on the EPIC-KITCHENS-55 and EPIC-KITCHENS-100 datasets;
  • Pre-extracted features for EPIC-KITCHENS-55 and EPIC-KITCHENS-100. Specifically, we include:
    • RGB features: extracted from RGB iamges using a BNInception CNN trained for the task of egocentric action recognition using Temporal Segment Networks;
    • Flow features: similar to RGB features, but extracted with a BNInception CNN trained on optical flow;
    • OBJ features: object-based features obtained by running a Faster R-CNN object detector trained on EPIC-KITCHENS-55;
  • The checkpoints of the RGB/Flow/OBJ/Fusion models trained for both tasks: egocentric action anticipation and early action recognition;
  • The checkpoints of the TSN models (to be used with the official PyTorch implementation of TSN);
  • The checkpoint of the Faster R-CNN object detector trained on EPIC-KITCHENS-55;
  • The training/validation split used for the experiments. Note that the TSN and Faster R-CNN models have been trained on the training set of this split.

Please, refer to the paper for more technical details. The following sections document the released material.

RU-LSTM Implementation and main training/validation/test program

The provided implementation and training/validation/test program can be found in the RULSTM directory. In order to proceed to training, it is necessary to retrieve the pre-extracted features from our website. To save space and bandwidth, we provide features extracted only on the subset of frames used for the experiments (we sampled frames at about 4fps - please see the paper). These features are sufficient to train/validate/test the methods on the whole EPIC-KITCHENS-55 dataset following the settings repor