Project Page | Paper | Bibtex
Any-resolution Training for High-resolution Image Synthesis.
ECCV 2022
Lucy Chai, Michaël Gharbi, Eli Shechtman, Phillip Isola, Richard Zhang
- Linux
- gcc-7
- Python 3
- NVIDIA GPU + CUDA CuDNN
Table of Contents:
- Colab - run it in your browser without installing anything locally
- Setup - download pretrained models and resources
- Pretrained Models - quickstart with pretrained models
- Notebooks - jupyter notebooks for interactive composition
- Training - pipeline for training encoders
- Evaluation - evaluation script
Interactive Demo: Try our interactive demo here! Does not require local installations.
- Clone this repo:
git clone https://github.com/chail/anyres-gan.git
- Install dependencies:
- gcc-7 or above is required for installation. Update gcc following these steps.
- We provide a Conda
environment.yml
file listing the dependencies. You can create a Conda environment with the dependencies using:
conda env create -f environment.yml
- Download resources: we provide a script for downloading associated resources and pretrained models. Fetch these by running:
bash download_resources.sh
Pretrained models are downloaded from the above download_resources.sh
script. Any-resolution images can be constructed by specifying the appropriate transformation matrices. The following code snippet provides a basic example; additional examples can be found in the notebook.
import pickle
import torch
import numpy as np
from util import patch_util, renormalize
torch.set_grad_enabled(False)
PATH = 'pretrained/bird_pretrained_final.pkl'
with open(PATH, 'rb') as f:
G_base = pickle.load(f)['G_ema'].cuda() # torch.nn.Module
full_size = 500
seed = 0
rng = np.random.RandomState(seed)
z = torch.from_numpy(rng.standard_normal(G_base.z_dim)).float()
z = z[None].cuda()
c = None
ws = G_base.mapping(z, c, truncation_psi=0.5, truncation_cutoff=8)
full = torch.zeros([1, 3, full_size, full_size])
patches = patch_util.generate_full_from_patches(full_size, G_base.img_resolution)
for bbox, transform in patches:
img = patch_util.scale_condition_wrapper(G_base, ws, transform[None].cuda(), noise_mode='const', force_fp32=True)
full[:, :, bbox[0]:bbox[1], bbox[2]:bbox[3]] = img
renormalize.as_image(full[0])
Note: remember to add the conda environment to jupyter kernels:
python -m ipykernel install --user --name anyres-gan
We provide example notebook notebook-demo.ipynb
for running inference on pretrained models.
See the script train.sh
for training examples.
Training notes:
- patch-based training is run in two stages: first global fixed-resolution pretraining, then patch training
- arguments
--batch-gpu
and--gamma
are taken from Stylegan 3 recommended configurations - arguments
--random_crop=True
and--patch_crop=True
performs random cropping on fixed-resolution and variable resolution datasets respectively. --scale_max
and--scale_min
correspond to the largest and smallest sampled image scales for patch training (size = 1/scale * g_size).--scale_max
should correspond to the smallest image size in the patch dataset (for example, if the smallest image is 512px and the generator size is 256, then--scale_max=0.5
). Omitting--scale_min
will use the smallest possible scale as the minimum bound (the image native size).--scale_mapping_min
and--scale_mapping_max
correspond to normalization limits in the scale mapping branch; the min can be kept at 1 and the max can be set to an approximate zoom factor between the fixed-resolution dataset and the size of the HR images.- for patch training, metrics are evaluated offline, hence
--metrics=none
should be specified for training. See below for more details on evaluation.
Training progress can be visualized using:
tensorboard --logdir training-runs/
Beyond the standard FFHQ and LSUN Church datasets, we train on datasets scraped from flickr. Due to licensing we cannot release this images directly. Please see datasets/download/download_dataset.sh
for examples on how to download the flickr datasets. You will need to fill in a flickr api key and secret and pip install flickr_api
.
For the LSUN Church dataset, you can follow the standard stylegan data preparation and use the resulting archive for training.
See custom_metrics.sh
for an example on running FID variations and pFID on the patch models.
- pFID can be specified using a string such as
fid-patch256-min256max0
: this samples 50k patches of size 256, with minimum image size 256 and maximum image size as the max size allowable by a given real image. - The max sampled size could also be specified with a number; for example
fid-patch256-min256max1024
. - For larger models (e.g. mountains), FID by default downsamples images to 299 width; therefore we use a variant that further takes a crop of the image:
fid-subpatch1024-min1024max0
. - Note that these metrics are implemented to run on a single gpu.
Note: the released pretrained models are reimplementations of the models used in the current paper version, so the evaluation numbers are slightly different.
Our code is largely based on the Stylegan3 repository (license). Changes to the StyleGAN3 code are documented in diff. Some additional utilities are from David Bau and Taesung Park, and we thank Assaf Shocher for proofreading. Remaining changes are covered under Adobe Research License.
If you use this code for your research, please cite our paper:
@inproceedings{chai2022anyresolution,
title={Any-resolution training for high-resolution image synthesis.},
author={Chai, Lucy and Gharbi, Michael and Shechtman, Eli and Isola, Phillip and Zhang, Richard},
booktitle={European Conference on Computer Vision},
year={2022}
}