bash OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=2 --master_port 12345 --nnodes=1 --master_addr="localhost" main_finetune.py --batch_size 64
wget https://huggingface.co/facebook/vit-mae-base/resolve/main/pytorch_model.bin
- Strong Augmentation code
- Evaluation code
- Make Submission file
- Do we use all frames for training our model? No, We use two random frames for each video.
- Loss abliation study(by using lambda)
This code base is strongly reused "re-implementation of the paper". Masked Autoencoders Are Scalable Vision Learners:
@Article{MaskedAutoencoders2021,
author = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
journal = {arXiv:2111.06377},
title = {Masked Autoencoders Are Scalable Vision Learners},
year = {2021},
}