Pengfei Wei1
Lingdong Kong1,2
Xinghua Qu1
Yi Ren1
Zhiqiang Xu3
Jing Jiang4
Xiang Yin1
1ByteDance AI Lab
2National University of Singapore
3MBZUAI
4University of Technology Sydney
TranSVAE is a disentanglement framework designed for unsupervised video domain adaptation. It aims at disentangling the domain information from the data during the adaptation process. We consider the generation of cross-domain videos from two sets of latent factors: one encoding the static domain-related information and another encoding the temporal and semantic-related information. Objectives are enforced to constrain these latent factors to achieve domain disentanglement and transfer.
Col1: Original sequences ("Human"
Visit our project page to explore more details. 🐾
- [2023.10] - We provide our extracted I3D features, kindly refer to this page for more details.
- [2023.09] - TranSVAE was accepted to NeurIPS 2023! 🎉
- [2022.08] - TranSVAE achieves 1st place among the UDA leaderboards of UCF-HMDB, Jester, and Epic-Kitchens, based on Paper-with-Code.
- [2022.08] - Try a Gradio demo for domain disentanglement in TranSVAE at Hugging Face Spaces! 🤗
- [2022.08] - Our paper is available on arXiv, click here to check it out!
- Highlights
- Installation
- Data Preparation
- Getting Started
- Main Results
- TODO List
- License
- Acknowledgement
- Citation
Conceptual Comparison |
---|
Graphical Model |
Framework Overview |
Please refer to INSTALL.md for the installation details.
Please refer to DATA_PREPARE.md for the details to prepare the 1UCF101, 2HMDB51, 3Jester, 4Epic-Kitchens, and 5Sprites datasets.
Please refer to GET_STARTED.md to learn more usage about this codebase.
Method | Backbone | U101 → H51 | H51 → U101 | Average |
---|---|---|---|---|
DANN (JMLR'16) | ResNet-101 | 75.28 | 76.36 | 75.82 |
JAN (ICML'17) | ResNet-101 | 74.72 | 76.69 | 75.71 |
AdaBN (PR'18) | ResNet-101 | 72.22 | 77.41 | 74.82 |
MCD (CVPR'18) | ResNet-101 | 73.89 | 79.34 | 76.62 |
TA3N (ICCV'19) | ResNet-101 | 78.33 | 81.79 | 80.06 |
ABG (MM'20) | ResNet-101 | 79.17 | 85.11 | 82.14 |
TCoN (AAAI'20) | ResNet-101 | 87.22 | 89.14 | 88.18 |
MA2L-TD (WACV'22) | ResNet-101 | 85.00 | 86.59 | 85.80 |
Source-only | I3D | 80.27 | 88.79 | 84.53 |
DANN (JMLR'16) | I3D | 80.83 | 88.09 | 84.46 |
ADDA (CVPR'17) | I3D | 79.17 | 88.44 | 83.81 |
TA3N (ICCV'19) | I3D | 81.38 | 90.54 | 85.96 |
SAVA (ECCV'20) | I3D | 82.22 | 91.24 | 86.73 |
CoMix (NeurIPS'21) | I3D | 86.66 | 93.87 | 90.22 |
CO2A (WACV'22) | I3D | 87.78 | 95.79 | 91.79 |
TranSVAE (Ours) | I3D | 87.78 | 98.95 | 93.37 |
Oracle | I3D | 95.00 | 96.85 | 95.93 |
Task | Source-only | DANN | ADDA | TA3N | CoMix | TranSVAE (Ours) | Oracle |
---|---|---|---|---|---|---|---|
JS → JT | 51.5 | 55.4 | 52.3 | 55.5 | 64.7 | 66.1 | 95.6 |
Task | Source-only | DANN | ADDA | TA3N | CoMix | TranSVAE (Ours) | Oracle |
---|---|---|---|---|---|---|---|
D1 → D2 | 32.8 | 37.7 | 35.4 | 34.2 | 42.9 | 50.5 | 64.0 |
D1 → D3 | 34.1 | 36.6 | 34.9 | 37.4 | 40.9 | 50.3 | 63.7 |
D2 → D1 | 35.4 | 38.3 | 36.3 | 40.9 | 38.6 | 50.3 | 57.0 |
D2 → D3 | 39.1 | 41.9 | 40.8 | 42.8 | 45.2 | 58.6 | 63.7 |
D3 → D1 | 34.6 | 38.8 | 36.1 | 39.9 | 42.3 | 48.0 | 57.0 |
D3 → D2 | 35.8 | 42.1 | 41.4 | 44.2 | 49.2 | 58.0 | 64.0 |
Average | 35.3 | 39.2 | 37.4 | 39.9 | 43.2 | 52.6 | 61.5 |
Domain Transfer Example
- Initial release. 🚀
- Add license. See here for more details.
- Add demo at Hugging Face Spaces.
- Add installation details.
- Add data preparation details.
- Add evaluation details.
- Add training details.
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
We acknowledge the use of the following public resources during the course of this work: 1UCF101, 2HMDB51, 3Jester, 4Epic-Kitchens, 5Sprites, 6I3D, and 7TRN.
If you find this work helpful, please kindly consider citing our paper:
@inproceedings{wei2023transvae,
title = {Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective},
author = {Wei, Pengfei and Kong, Lingdong and Qu, Xinghua and Ren, Yi and Xu, Zhiqiang and Jiang, Jing and Yin, Xiang},
booktitle = {Advances in Neural Information Processing Systems},
year = {2023},
}