Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

This is the official repository of Frido. Currently, text-to-image and layout-to-image for COCO are supported (inference-only). We will release more pre-trained models for other image synthesis tasks. Training code will be available in future version. Please stay tuned!

Machine environment

Ubuntu version: 18.04.5 LTS
CUDA version: 11.6
Testing GPU: Nvidia Tesla V100

Requirements

A conda environment named frido can be created and activated with:

conda env create -f environment.yaml
conda activate frido

Datasets setup

We provide two approaches to set up the datasets:

Auto-download

To automatically download datasets and save it into the default path (../), please use following script:

bash tools/download_datasets.sh

Manual setup

Text-to-image generation

We use COCO 2014 splits for text-to-image task, which can be downloaded from official COCO website.

Please create a folder name 2014 and collect the downloaded data and annotations as follows.

COCO 2014 file structure

>2014
├── annotations
│   └── captions_val2014.json
│   └── ...
└── val2014
   └── COCO_val2014_000000000073.jpg
   └── ...

Layout-to-image generation

We use COCO 2017 splits to test Frido on layout-to-image task, which can be downloaded from official COCO website.

Please create a folder name 2017 and collect the downloaded data and annotations as follows.

COCO 2017 file structure

>2017
├── annotations
│   └── captions_val2017.json
│   └── ...
└── val2017
   └── 000000000872.jpg
   └── ...

File structure for dataset and code

Please make sure that the file structure is the same as the following. Or, you might modify the config file to match the corresponding paths.

File structure

>datasets
├── coco
│   └── 2014
│        └── annotations
│        └── val2014
│        └── ...
│   └── 2017
│        └── annotations
│        └── val2017
│        └── ...
>Frido
└── configs
│   └── t2i
│   └── ... 
└── exp
│   └── t2i
│        └── frido_f16f8
│             └── checkpoints
│                  └── model.ckpt
│   └── layout2i
│   └── ...
└── frido
└── scripts
└── tools
└── ...

Download pre-trained models

The following table describs tasks and models that are currently available. To auto-download all model checkpoints of Frido, please use following command:

bash tools/download.sh

Task	Datase	FID	Comments
Text-to-image	COCO 2014	11.24
Text-to-image (mini)	COCO 2014	64.85	1000 images of mini-val; FID was calculated against corresponding GT images.
Layout-to-image	COCO (finetuned OpenImage)	37.14	FID calculated on 2,048 val images.
Layout-to-image (mini)	COCO (finetuned OpenImage)	122.48	500 images of mini-val; FID was calculated against corresponding GT images.

The mini-versions are for quick testing and reproducing, which can be done within 1 hours on 1V100. High FID is expected. To evaluate generation quality, full validation / test split needs to be run.*

FID scores were evaluated by using torch-fidelity. The scores may slightly fluctuate due to the inherent initial random noise of diffusion models.

Inference Frido

We now provide scripts for testing Frido. (Full training code will be released soon.)

Quick Start

Please checkout the jupyter notebook demo.ipynb for a simple demo on text-to-image generation for COCO.

Once the datasets and model weights are properly set up, one may test Frido by the following commands.

Text-to-image

# for full validation:
bash tools/eval_t2i.sh

# for mini-val:
bash tools/eval_t2i_minival.sh

Default output folder will be exp/t2i/frido_f16f8/samples

Layout-to-image

# for full validation:
bash tools/eval_layout2i.sh

# for mini-val:
bash tools/eval_layout2i_minival.sh

Default output folder will be exp/layout2i/frido_f8f4/samples

(Optional) You can modify the script by adding following augments.

-o [OUTPUT_PATH] : to change the output folder path.
-c [INT] : number of steps for ddim and fastdpm sampling. (default=200)

Multi-GPU testing

We provide code for multiple GPUs testing. Please refer to scripts of tools/eval_t2i_multiGPU.sh

For example, 4-gpu inference can be run by the following.

bash eval_t2i_multiGPU.sh 4

Evaluation

FID scores were evaluated by using torch-fidelity.

After running inference, FID score can be computed by the following command:

fidelity --gpu 0 --fid --input2 [GT_FOLDER] --input1 [PRED_FOLDER]

Example:

fidelity --gpu 0 --fid --input2 exp/t2i/frido_f16f8/samples/.../img/inputs --input1 exp/t2i/frido_f16f8/samples/.../img/sample

Acknowledgement

We build Frido codebase heavily on the codebase of Latent Diffusion Model (LDM) and VQGAN. We sincerely thank the authors for open-sourcing!

Citation

If you find this code useful for your research, please consider citing:

@article{fan2022frido,
  title={Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis},
  author={Fan, Wan-Cyuan and Chen, Yen-Chun and Chen, Dongdong and Cheng, Yu and Yuan, Lu and Wang, Yu-Chiang Frank},
  journal={arXiv preprint arXiv:2208.13753},
  year={2022}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
figures		figures
frido		frido
scripts		scripts
taming		taming
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
environment.yaml		environment.yaml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Machine environment

Requirements

Datasets setup

Auto-download

Manual setup

Text-to-image generation

Layout-to-image generation

File structure for dataset and code

Download pre-trained models

Inference Frido

Quick Start

Text-to-image

Layout-to-image

Multi-GPU testing

Evaluation

Acknowledgement

Citation

License

About

Releases

Packages

Languages

License

TrellixVulnTeam/Frido_O7MX

Folders and files

Latest commit

History

Repository files navigation

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Machine environment

Requirements

Datasets setup

Auto-download

Manual setup

Text-to-image generation

Layout-to-image generation

File structure for dataset and code

Download pre-trained models

Inference Frido

Quick Start

Text-to-image

Layout-to-image

Multi-GPU testing

Evaluation

Acknowledgement

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages