US3L-CVPR2023

Official code for Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning (Accepted to CVPR2023).

Introduction

We propose universally slimmable self-supervised learning (dubbed as US3L) to achieve better accuracy-efficiency trade-offs for deploying self-supervised models across different devices.

Getting Started

Prerequisites

Python 3
PyTorch (= 1.10.0)
Torchvision (= 0.11.1)
Numpy
CUDA 10.1

All dataset definitions are in the datasets folder. By default, the form of PyTorch's train/val folder is used. You can specify the path of the dataset yourself in the corresponding dataset file.

We used 4 2080Ti GPUs for CIFAR experiments (except for ResNet-50) and 8 2080Ti GPUs for ImageNet experiments.

CIFAR Experiments

Pre-training stage using US3L (c.f. scripts/cifar/run_simclr_featdistill.sh), run:

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py \
    ssl --config ./configs/cifar/us_resnet18_simclr_featdistill_cifar.yaml \
    --output [your_checkpoint_dir] -j 8

You can set hyper-parameters manually in the corresponding .yaml file (e.g., configs/cifar/us_resnet18_simclr_featdistill_cifar.yaml here).

MODEL:
    ARCH: us_cifar_resnet18 # backbone model
    OFA:
        WIDTH_MULT_RANGE: [0.25, 1.0]  # range of widths
        NUM_SAMPLE_TRAINING: 3 # number of sampled sub-networks
        SANDWICH: False # whether to use the sandwich rule
        SAMPLE_SCHEDULER: A # A denotes dynamic sampling, set it to "" to deactivate
        MOMENTUM_UPDATE: False # False: only update momentum encoder once (max model forward pass)
        DISTILL: True # whether to use inplace distillation
        DISTILL_CRITERION: CosineSimilarity # distillation loss
        REGULARIZER: True # whether to use group regularization
        REGULARIZER_CRITERION: GroupReg
        REGULARIZER_LAMBDA: 0.00005 # set it to 1/2 weight decay
        REGULARIZER_DECAY_ALPHA: 0.05 # hyper-parameter alpha
        REGULARIZER_DECAY_TYPE: linear # schedule
        REGULARIZER_DECAY_BINS: 8
TRAIN:
    EPOCHS: 400 # number of epochs
    DATASET: cifar100 # dataset
    BATCH_SIZE: 128 # per-gpu batch size
    LOSS:
        CRITERION:
            NAME: InfoNCEv2_distill # base loss
SSL:
    TYPE: SimCLR_Momentum # SSL algorithm
    SETTING:
        DIM: 256
        HIDDEN_DIM: 2048
        T: 0.5
        MLP: TRUE
        PREDICTOR: True # use auxillary distillation head
        MOMENTUM: 0.99

You can also use other baseline methods for pre-training. For example, you can run scripts/cifar/run_simsiam.sh for SimSiam, scripts/cifar/run_moco.sh for MoCo.
Linear evaluation stage using pre-trained models (c.f. scripts/cifar/run_linear_eval.sh), run:

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py \
    linear  --config ./configs/cifar/resnet18_linear_eval_cifar.yaml \
    --output [output_dir] -j 8

To specify the pre-trained model path and evaluation channel-width, you need to mannually modify the .yaml file (configs/cifar/resnet18_linear_eval_cifar.yaml). For instance, our pretrained model is ./checkpoint.pth.tar and we want to evaluate it with channel width at 0.25x, the modification in yaml file should look like

MODEL:
    PRETRAINED: ./checkpoint.pth.tar # you need to specify checkpoint here
    OFA:
        CALIBRATE_BN: True
        NUM_SAMPLE_TRAINING: 1
        WIDTH_MULT: 0.25 # width multiplier

ImageNet Experiments

Pre-training and Linear Evaluation

Pre-training stage using US3L (c.f. scripts/imagenet/run_byol_featdistill_imagenet.sh), run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
    ssl --config ./configs/imagenet/us_resnet50_byol_featdistill_imagenet.yaml \
    --output /data/train_log_OFA/imagenet/r50/ours_base_mse_distill_mse_100ep -j 8

Linear evaluation on ImageNet (c.f. scripts/imagenet/run_linear_eval_imagenet.sh). If you want to specify the model weights and width multiplier, see the instructions on CIFAR above and modify the corresponding configs/imagenet/resnet18_linear_eval_imagenet.yaml.

Join Us

Welcome to be a member (or an intern) of our team if you are interested in Quantization, Pruning, Distillation, Self-Supervised Learning and Model Deployment. Please send your resume to [email protected].

Citation

Please consider citing our work in your publications if it helps your research.

@article{US3L,
   title         = {Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning},
   author        = {Yun-Hao Cao, Peiqin Sun and Shuchang Zhou},
   year          = {2023},
   booktitle={The IEEE Conference on Computer Vision and Pattern Recognition},}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ViT_experiments		ViT_experiments
configs		configs
datasets		datasets
losses		losses
meters		meters
models		models
scripts		scripts
utils		utils
workers		workers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

US3L-CVPR2023

Introduction

Getting Started

Prerequisites

CIFAR Experiments

ImageNet Experiments

Pre-training and Linear Evaluation

Join Us

Citation

About

Releases

Packages

Contributors 2

Languages

License

megvii-research/US3L-CVPR2023

Folders and files

Latest commit

History

Repository files navigation

US3L-CVPR2023

Introduction

Getting Started

Prerequisites

CIFAR Experiments

ImageNet Experiments

Pre-training and Linear Evaluation

Join Us

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages