AlphAction aims to detect the actions of multiple persons in videos. It is the first open-source project that achieves 30+ mAP (32.4 mAP) with single model on AVA dataset.
This project is the official implementation of paper Asynchronous Interaction Aggregation for Action Detection, authored by Jiajun Tang*, Jin Xia* (equal contribution), Xinzhi Mu, Bo Pang, Cewu Lu (corresponding author).
You need first to install this project, please check INSTALL.md
To do training or inference on AVA dataset, please check DATA.md for data preparation instructions.
config | backbone | structure | mAP | in paper | model |
---|---|---|---|---|---|
resnet50_4x16f_parallel | ResNet-50 | Parallel | 29.0 | 28.9 | [link] |
resnet50_4x16f_serial | ResNet-50 | Serial | 29.8 | 29.6 | [link] |
resnet50_4x16f_denseserial | ResNet-50 | Dense Serial | 30.0 | 29.8 | [link] |
resnet101_8x8f_denseserial | ResNet-101 | Dense Serial | 32.4 | 32.3 | [link] |
To run the demo program on video or webcam, please check the folder demo. We select 15 common categories from the 80 action categories of AVA, and provide a practical model which achieves high accuracy (about 70 mAP) on these categories.
The hyper-parameters of each experiment are controlled by
a .yaml config file, which is located in the directory
config_files
. All of these configuration files assume
that we are running on 8 GPUs. We need to create a symbolic
link to the directory output
, where the output (logs and checkpoints)
will be saved. Besides, we recommend to create a directory models
to place
model weights. These can be done with following commands.
mkdir -p /path/to/output
ln -s /path/to/output data/output
mkdir -p /path/to/models
ln -s /path/to/models data/models
The pre-trained model weights and the training code will be public available later. 😉
First, you need to download the model weights from Model Zoo.
To do inference on single GPU, you only need to run the following command.
It will load the model from the path speicified in MODEL.WEIGHT
.
Note that the config VIDEOS_PER_BATCH
is a global config, if you face
OOM error, you could overwrite the config in the command line as we do
in below command.
python test_net.py --config-file "path/to/config/file.yaml" \
MODEL.WEIGHT "path/to/model/weight" \
TEST.VIDEOS_PER_BATCH 4
We use the launch utility torch.distributed.launch
to launch multiple
processes for inference on multiple GPUs. GPU_NUM
should be
replaced by the number of gpus to use. Hyper-parameters in the config file
can still be modified in the way used in single-GPU inference.
python -m torch.distributed.launch --nproc_per_node=GPU_NUM \
test_net.py --config-file "path/to/config/file.yaml" \
MODEL.WEIGHT "path/to/model/weight"
We thankfully acknowledge the computing resource support of Huawei Corporation for this project.
If this project helps you in your research or project, please cite this paper:
@article{tang2020asynchronous,
title={Asynchronous Interaction Aggregation for Action Detection},
author={Tang, Jiajun and Xia, Jin and Mu, Xinzhi and Pang, Bo and Lu, Cewu},
journal={arXiv preprint arXiv:2004.07485},
year={2020}
}