Skip to content

An official codebase of Two-Stream Transformer for Multi-Label Image Classification, ACMMM 2022.

Notifications You must be signed in to change notification settings

jasonseu/TSFormer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Two-Stream Transformer for Multi-Label Image Classification

Introduction

This is an official PyTorch implementation of Two-Stream Transformer for Multi-Label Image Classification [paper]. alt tsformer

Data Preparation

  1. Download dataset and organize them as follow:
|datasets
|---- MSCOCO
|---- NUS-WIDE
|---- VOC2007
  1. Preprocess using following commands:
python scripts/mscoco.py
python scripts/nuswide.py
python scripts/voc2007.py
python embedding.py --data [mscoco, nuswide, voc2007]

Requirements

torch >= 1.9.0
torchvision >= 0.10.0

Training

One can use following commands to train model.

python train.py --data mscoco --batch_size 16 --optimizer AdamW --lr 0.00001 --mode part --start_depth 9
python train.py --data nuswide --batch_size 16 --optimizer AdamW --lr 0.00001 --mode part --start_depth 1
python train.py --data voc2007 --batch_size 16 --optimizer AdamW --lr 0.00001 --mode part --start_depth 4

Evaluation

Pre-trained weights can be found in google drive. Download and put them in the experiments folder, then one can use follow commands to reproduce results reported in paper.

python evaluate.py --exp experiments/TSFormer_mscoco/exp1    # Microsoft COCO
python evaluate.py --exp experiments/TSFormer_nuswide/exp1   # NUS-WIDE
python evaluate.py --exp experiments/TSFormer_voc2007/exp1   # Pascal VOC 2007

Main Results

dataaset mAP
VOC 2007 97.0
MS-COCO 88.9
NUS-WIDE 69.3

Citation

@inproceedings{zhu2022two,
  title={Two-stream transformer for multi-label image classification},
  author={Zhu, Xuelin and Cao, Jiuxin and Ge, Jiawei and Liu, Weijia and Liu, Bo},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  pages={3598--3607},
  year={2022}
}

About

An official codebase of Two-Stream Transformer for Multi-Label Image Classification, ACMMM 2022.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages