Skip to content

Latest commit

 

History

History
73 lines (54 loc) · 4.72 KB

README.md

File metadata and controls

73 lines (54 loc) · 4.72 KB

Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning

arXiv

PWC PWC PWC PWC PWC PWC

This repository is the official implementation of Side4Video, which significantly reduces the training memory cost for action recognition and text-video retrieval tasks.

image

📰 News

🗺️ Overview

image

🚀 Training and Testing

For training and testing our model, please refer to the Recognition and Retrieval folders.

📊 Results

image
Our best model can achieve an accuracy of 67.3% & 74.6 on Something-Something V1 & V2, 88.6% on Kinetics-400 and a Recall@1 of 52.3% on MSR-VTT, 56.1% on MSVD, 68.8% on VATEX.

🖇️ Citation

If you find this repository is useful, please star🌟 this repo and cite🖇️ our paper.

@article{yao2023side4video,
  title={Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning},
  author={Yao, Huanjin and Wu, Wenhao and Li, Zhiheng},
  journal={arXiv preprint arXiv:2311.15769},
  year={2023}
}

👍 Acknowledgment

Our implementation is mainly based on the following codebases. We are sincerely grateful for their work.

  • Text4Vis: Revisiting Classifier: Transferring Vision-Language Models for Video Recognition.
  • CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval.

📧 Contact

If you have any questions about this repository, please file an issue or contact Huanjin Yao Gmail Badge or Wenhao Wu Gmail Badge.