MDef-DETR: Multi-modal Deformable Detection Transformer

The repository contains the training code of MDef-DETR. The paper is availe on arxiv.

Requirements

pip install -r requirements.txt

Training

Distributed training is available via Slurm and submitit:

pip install submitit

The config file for pretraining is configs/pretrain.json and looks like:

{
    "combine_datasets": ["flickr", "mixed"],
    "combine_datasets_val": ["flickr", "gqa", "refexp"],
    "coco_path": "",
    "vg_img_path": "",
    "flickr_img_path": "",
    "refexp_ann_path": "annotations/",
    "flickr_ann_path": "annotations/",
    "gqa_ann_path": "annotations/",
    "refexp_dataset_name": "all",
    "GT_type": "separate",
    "flickr_dataset_path": ""
}

Download the original Flickr30k image dataset from : Flickr30K webpage and update the flickr_img_path to the folder containing the images.
Download the original Flickr30k entities annotations from: Flickr30k annotations and update the flickr_dataset_path to the folder with annotations.
Download the gqa images at GQA images and update vg_img_path to point to the folder containing the images.
Download COCO images Coco train2014. Update the coco_path to the folder containing the downloaded images.
Download pre-processed annotations that are converted to coco format (all datasets present in the same zip folder for MDETR annotations): Pre-processed annotations and update the flickr_ann_path, gqa_ann_path and refexp_ann_path to this folder with pre-processed annotations.

Alternatively, you can download the preprocessed data from the link as a single zip file and extract it under data directory.

Script to run training

This command will reproduce the training of the resnet 101.

python run_with_submitit.py --dataset_config configs/pretrain.json  --ngpus 8 --nodes 4 --ema --epochs 20 --lr_drop 16

Citation

If you use our work, please consider citing MDef-DETR:

    @article{Maaz2021Multimodal,
        title={Multi-modal Transformers Excel at Class-agnostic Object Detection},
        author={Muhammad Maaz and Hanoona Rasheed and Salman Khan and Fahad Shahbaz Khan and Rao Muhammad Anwer and Ming-Hsuan Yang},
        journal={ArXiv 2111.11430},
        year={2021}
    }

Credits

This codebase is modified from the MDETR repository. We thank them for their implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.github		.github
configs		configs
datasets		datasets
extra_scripts		extra_scripts
models		models
scripts		scripts
util		util
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build_and_push_docker_image.sh		build_and_push_docker_image.sh
docker-compose.yml		docker-compose.yml
engine.py		engine.py
hubconf.py		hubconf.py
main.py		main.py
pretrain.sh		pretrain.sh
requirements.txt		requirements.txt
run_with_submitit.py		run_with_submitit.py
run_with_submitit_gqa_eval.py		run_with_submitit_gqa_eval.py
run_with_submitit_lvis_eval.py		run_with_submitit_lvis_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MDef-DETR: Multi-modal Deformable Detection Transformer

Requirements

Training

Script to run training

Citation

Credits

About

Releases

Packages

Contributors 2

Languages

License

mmaaz60/mdef_detr

Folders and files

Latest commit

History

Repository files navigation

MDef-DETR: Multi-modal Deformable Detection Transformer

Requirements

Training

Script to run training

Citation

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages