This is a PyTorch implementation of the MoCo paper:
@Article{he2019moco,
author = {Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},
title = {Momentum Contrast for Unsupervised Visual Representation Learning},
journal = {arXiv preprint arXiv:1911.05722},
year = {2019},
}
It also includes the implementation of the MoCo v2 paper:
@Article{chen2020mocov2,
author = {Xinlei Chen and Haoqi Fan and Ross Girshick and Kaiming He},
title = {Improved Baselines with Momentum Contrastive Learning},
journal = {arXiv preprint arXiv:2003.04297},
year = {2020},
}
Install PyTorch and ImageNet dataset following the official PyTorch ImageNet training code.
This repo aims to be minimal modifications on that code. Check the modifications by:
diff main_moco.py <(curl https://raw.githubusercontent.com/pytorch/examples/master/imagenet/main.py)
diff main_lincls.py <(curl https://raw.githubusercontent.com/pytorch/examples/master/imagenet/main.py)
This implementation only supports multi-gpu, DistributedDataParallel training, which is faster and simpler; single-gpu or DataParallel training is not supported.
To do unsupervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:
python main_moco.py \
-a resnet50 \
--lr 0.03 \
--batch-size 256 \
--dist-url 'tcp:https://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
[your imagenet-folder with train and val folders]
This script uses all the default hyper-parameters as described in the MoCo v1 paper. To run MoCo v2, set --mlp --moco-t 0.2 --aug-plus --cos
.
Note: for 4-gpu training, we recommend following the linear lr scaling recipe: --lr 0.015 --batch-size 128
with 4 gpus. We got similar results using this setting.
With a pre-trained model, to train a supervised linear classifier on frozen features/weights in an 8-gpu machine, run:
python main_lincls.py \
-a resnet50 \
--lr 30.0 \
--batch-size 256 \
--pretrained [your checkpoint path]/checkpoint_0199.pth.tar \
--dist-url 'tcp:https://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
[your imagenet-folder with train and val folders]
Linear classification results on ImageNet using this repo with 8 NVIDIA V100 GPUs :
pre-train epochs |
pre-train time |
MoCo v1 top-1 acc. |
MoCo v2 top-1 acc. |
|
---|---|---|---|---|
ResNet-50 | 200 | 53 hours | 60.8±0.2 | 67.5±0.1 |
Here we run 5 trials (of pre-training and linear classification) and report mean±std: the 5 results of MoCo v1 are {60.6, 60.6, 60.7, 60.9, 61.1}, and of MoCo v2 are {67.7, 67.6, 67.4, 67.6, 67.3}.
Our pre-trained ResNet-50 models can be downloaded as following:
epochs | mlp | aug+ | cos | top-1 acc. | model | md5 | |
---|---|---|---|---|---|---|---|
MoCo v1 | 200 | 60.6 | download | b251726a | |||
MoCo v2 | 200 | ✓ | ✓ | ✓ | 67.7 | download | 59fd9945 |
MoCo v2 | 800 | ✓ | ✓ | ✓ | 71.1 | download | a04e12f8 |
See ./detection.
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.
- moco.tensorflow: A TensorFlow re-implementation.
- Colab notebook: CIFAR demo on Colab GPU.