This is the official implementation of MOTT paper, a novel multi-object tracking model. The code is inspired by TrackFormer, TransTrack, DETR, CSWin by taking the effective Transformer components (CSWin Encoder, deformable DETR decoder) forming a new light-weighted Transformer specialized in MOT.
The paper is accepted and published in Journal AI Open. It is available here.
Multi-object tracking (MOT) is one of the most essential and challenging tasks in computer vision (CV). Unlike object detectors, MOT systems nowadays are more complicated and consist of several neural network models. Thus, the balance between the system performance and the runtime is crucial for online scenarios. While some of the works contribute by adding more modules to achieve improvements, we propose a pruned model by leveraging the state-of-the-art Transformer backbone model. Our model saves up to 62% FLOPS compared with other Transformer-based models and almost as twice as fast as them. The results of the proposed model are still competitive among the state-of-the-art methods. Moreover, we will open-source our modified Transformer backbone model for general CV tasks as well as the MOT system.
Please visit the installation.md for guidances.
Please visit the dataset.md for dataset preparation. Then, head to the train.md for training scripts.
We split the MOT17 dataset into two halves as shown in the paper, then we trained all models on the first half using the same schedule and evaluated on the second half.
Model | MOTA ↑ | MOTP ↑ | IDF1 ↑ | MT ↑ | ML ↓ |
---|---|---|---|---|---|
TransTrack | 66.5% | 83.4% | 66.8% | 134 | 61 |
TrackFormer | 67.0% | 84.1% | 69.5% | 152 | 57 |
MOTT | 71.6% | 84.5% | 71.7% | 166 | 41 |
We evaluated MOTT on the testing sets of MOT20 and DanceTrack in addition to MOT17. The MOT17 and MOT20 results are obtained from the model trained by corresponding datasets, while the DanceTrack results are derived from the MOT20 model without fine-tuning.
Dataset | MOTA ↑ | MOTP ↑ | IDF1 ↑ | MT ↑ | ML ↓ |
---|---|---|---|---|---|
DanceTrack | 85.4% | 81.9% | 33.7% | 81.5% | 0.3% |
MOT20 | 66.5% | 81.1% | 57.9% | 52.1% | 13.8% |
MOT17 | 71.6% | 84.5% | 71.7% | 49.0% | 12.1% |
Four models are compared in terms of the number of parameters (#Params), total CUDA time used, and averaged FLOPS.
Model | #Params (M)↓ | CUDA time (s)↓ | Avg. FLOPS (G)↓ |
---|---|---|---|
TransTrack | 46.9 | 8.17 | 428.69 |
TrackFormer | 44.0 | 13.67 | 674.92 |
TrackFormer-CSWin | 38.3 | 16.26 | 714.83 |
MOTT | 32.5 | 6.76 | 255.74 |
The ablation study shows the performance differences when gradually removing the components. Notations: Res=ResNet50, CSWin=CSWin-tiny, DE=Deformable Encoder, DD=Deformable Decoder.
Modules | MOTA ↑ | IDF1 ↑ | Hz ↑ |
---|---|---|---|
Res+DE+DD (TrackFormer) | 66.8% | 70.7% | 5.39 |
CSWin+DE+DD | 72.7% | 72.9% | 4.73 |
CSWin+DD (MOTT) | 71.9% | 72.6% | 9.09 |
- Install and activate the Python environment.
- Download the pre-trained weights
cswin_tiny_224.pth
andmot17_ch_mott.tar.gz
from OwnCloud. - Put
cswin_tiny_224.pth
in./models
folder. Extractmot17_ch_mott
folder and put it in./models
folder. - Put the testing video (
.mov
,.mp4
,.avi
formats) in./data/videos/
folder. - Run the command at the root of the repo:
python src/track_online.py
The program will show a list of available videos in the folder.
Select a video by inputting the index number.
Stop video by issuing key q
.
Terminate the program by issuing ctrl+c
.
The config file of the program is stored in cfgs/track_online.yaml
.
- The dataset should follow the same structure of MOT17, MOT20, or DanceTrack in order to evaluate it.
- You can find the configuration file at
cfgs/track_exp.yaml
.dataset_name
specifies the dataset to use. Checksrc/trackformer/datasets/tracking/factory.py
for all available dataset options.obj_detect_checkpoint_file
: denotes the path for model checkpoint file.
After modified the configuration, start evaluation by running:
python src/track.py with exp
Shan Wu; Amnir Hadachi; Chaoru Lu, Damien Vivet.
If you use MOTT in an academic work, please cite:
@article{wu2023mott,
title={MOTT: A new model for multi-object tracking based on green learning paradigm},
author={Wu, Shan and Hadachi, Amnir and Lu, Chaoru and Vivet, Damien},
journal={AI Open},
year={2023},
publisher={Elsevier}
}
Published paper is here.