# UNINEXT MODEL ZOO
## Introduction
UNINEXT achieves superior performance on 20 benchmarks, using the same model with the same model parameters. UNINEXT has 3 training stages, pretraining, image-level joint training, and video-level joint training. We provide all the checkpoints of all stages for models with different backbones.
### Stage 1: Pretraining
Backbone |
YAML |
Model |
ResNet-50 |
obj365v2_32g_r50 |
model |
ConvNeXt-Large |
obj365v2_32g_convnext_large |
model |
ViT-Huge |
obj365v2_32g_vit_huge |
model |
### Stage 2: Image-level Joint Training
Backbone |
YAML |
Model |
ResNet-50 |
image_joint_r50 |
model |
ConvNeXt-Large |
image_joint_convnext_large |
model |
ViT-Huge |
image_joint_vit_huge_32g |
model |
### Stage 3: Video-level Joint Training
All numbers reported in the paper (Table 1 to Table 10) uses the following models.
Backbone |
YAML |
Model |
ResNet-50 |
video_joint_r50 |
model |
ConvNeXt-Large |
video_joint_convnext_large |
model |
ViT-Huge |
video_joint_vit_huge |
model |
Please note that the pretrained weights used in this stage ends with `model_final_4c.pth`. To obtain these weights, please run the following commands
```
python3 conversion/convert_3c_to_4c_pth.py # ResNet backbone
python3 conversion/convert_3c_to_4c_pth_convnext.py # ConvNeXt backbone
python3 conversion/convert_3c_to_4c_pth_vit.py # ViT backbone
```
### Single Tasks
We also provide models trained on a single task with ResNet-50 backbone (Table 11 in the paper).
Task |
YAML |
Model |
OD&IS |
single_task_det |
model |
REC&RES |
single_task_rec |
model |
VIS |
single_task_vis |
model |
RVOS |
single_task_rvos |
model |
SOT&VOS |
single_task_sot |
model |