This is the coolest (and the only one so far lol) PyTorch implementation of VATT from the paper VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text. The official TensorFlow implementation can be found here.
MattBortoletto/VATT-pytorch
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.