# UNINEXT MODEL ZOO ## Introduction UNINEXT achieves superior performance on 20 benchmarks, using the same model with the same model parameters. UNINEXT has 3 training stages, pretraining, image-level joint training, and video-level joint training. We provide all the checkpoints of all stages for models with different backbones. ### Stage 1: Pretraining

Backbone	YAML	Model
ResNet-50	obj365v2_32g_r50	model
ConvNeXt-Large	obj365v2_32g_convnext_large	model
ViT-Huge	obj365v2_32g_vit_huge	model

### Stage 2: Image-level Joint Training

Backbone	YAML	Model
ResNet-50	image_joint_r50	model
ConvNeXt-Large	image_joint_convnext_large	model
ViT-Huge	image_joint_vit_huge_32g	model

### Stage 3: Video-level Joint Training All numbers reported in the paper (Table 1 to Table 10) uses the following models.

Backbone	YAML	Model
ResNet-50	video_joint_r50	model
ConvNeXt-Large	video_joint_convnext_large	model
ViT-Huge	video_joint_vit_huge	model

Please note that the pretrained weights used in this stage ends with `model_final_4c.pth`. To obtain these weights, please run the following commands ``` python3 conversion/convert_3c_to_4c_pth.py # ResNet backbone python3 conversion/convert_3c_to_4c_pth_convnext.py # ConvNeXt backbone python3 conversion/convert_3c_to_4c_pth_vit.py # ViT backbone ``` ### Single Tasks We also provide models trained on a single task with ResNet-50 backbone (Table 11 in the paper).

Task	YAML	Model
OD&IS	single_task_det	model
REC&RES	single_task_rec	model
VIS	single_task_vis	model
RVOS	single_task_rvos	model
SOT&VOS	single_task_sot	model