[CVPR 2024] One-Shot Structure-Aware Stylized Image Synthesis

One-Shot Structure-Aware Stylized Image Synthesis
Hansam Cho, Jonghyun Lee, Seunggyu Chang, Yonghyun Jeong

Abstract:
While GAN-based models have been successful in image stylization tasks, they often struggle with structure preservation while stylizing a wide range of input images. Recently, diffusion models have been adopted for image stylization but still lack the capability to maintain the original quality of input images. Building on this, we propose OSASIS: a novel one-shot stylization method that is robust in structure preservation. We show that OSASIS is able to effectively disentangle the semantics from the structure of an image, allowing it to control the level of content and style implemented to a given input. We apply OSASIS to various experimental settings, including stylization with out-of-domain reference images and stylization with text-driven manipulation. Results show that OSASIS outperforms other stylization methods, especially for input images that were rarely encountered during training, providing a promising solution to stylization via diffusion models.

Description

Official implementation of One-Shot Structure-Aware Stylized Image Synthesis

Setup

conda env create -f environment.yaml
conda activate osasis

Prepare Training

Download DDPM with P2-weighting trained on FFHQ (ffhq_p2.pt) and put model checkpoint in P2_weighting/models

OSASIS
|--P2_weighting
|  |--models
|  |  |--ffhq_p2.pt

Generate style images in domain A (photorealistic domain)

DEVICE=0

SAMPLE_FLAGS="--attention_resolutions 16 --class_cond False --class_cond False --diffusion_steps 1000 --dropout 0.0 \
    --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 128 --num_res_blocks 1 --num_head_channels 64 \
    --resblock_updown True --use_fp16 False --use_scale_shift_norm True --timestep_respacing 50"

CUDA_VISIBLE_DEVICES=${DEVICE} \
python gen_style_domA.py ${SAMPLE_FLAGS} \
    --model_path P2_weighting/models/ffhq_p2.pt \
    --input_dir imgs_style_domB \
    --sample_dir imgs_style_domA \
    --img_name img1.png \
    --n 1 \
    --t_start_ratio 0.5 \
    --seed 1 \

input_dir: directory of style images in domain B (stylized domain)
sample_dir: saving directory of style images in domain A (photorealistic domain)
img_name: name of style image
n: number of sampling images to generate style image in domain A
t_srtart_ratio: noising level of image ($t_0$)

Training

Download DiffAE trained on FFHQ( ffhq256_autoenc, ffhq256_autoenc_latent ) and put model checkpoints in diffae/checkpoints

OSASIS
|--diffae
|  |--checkpoints
|  |  |--ffhq256_autoenc
|  |  |  |--last.ckpt
|  |  |  |--latent.pkl
|  |  |--ffhq256_autoenc_latent
|  |  |  |--last.ckpt

Train the model using the following scripts, which necessitate 34GB of VRAM for a batch size of 8. The process takes in approximately 30 minutes on a single A100 GPU.

DEVICE=0

CUDA_VISIBLE_DEVICES=${DEVICE} \
python train_diffaeB.py \
    --style_domA_dir imgs_style_domA \
    --style_domB_dir imgs_style_domB \
    --ref_img img1.png \
    --work_dir exp/img1 \
    --n_iter 200 \
    --ckpt_freq 200 \
    --batch_size 8 \
    --map_net \
    --map_time \
    --lambda_map 0.1 \
    --train

style_domA_dir: directory of style images in domain A (photorealistic domain)
style_domB_dir: directory of style images in domain B (stylized domain)
ref_img: name of style image
work_dir: working directory
n_iter: number of iteration

Testing

Generate stylized image with following scripts:

DEVICE=0

CUDA_VISIBLE_DEVICES=${DEVICE} \
python eval_diffaeB.py \
    --style_domB_dir imgs_style_domB \
    --infer_dir imgs_input_domA \
    --ref_img img1.png \
    --work_dir exp/img1 \
    --map_net \
    --map_time \
    --lambda_map 0.1

style_domB_dir: directory of style images in domain B (stylized domain)
infer_dir: directory of input images in domain A (photorealistic domain)
ref_img: name of style image
work_dir: working directory

Using Pretrained Models

Download pretrained weights in this link and put checkpoint as shown in below

OSASIS
|--exp
|  |--img1
|  |  |--ckpt
|  |  |  |--iter_200.ckpt
|  |--img2
|  |  |--ckpt
|  |  |  |--iter_200.ckpt

Acknowledgements

This repository is built upon P2-weighting, DiffAE, and MindTheGap

Citation

@article{cho2024one,
  title={One-Shot Structure-Aware Stylized Image Synthesis},
  author={Cho, Hansam and Lee, Jonghyun and Chang, Seunggyu and Jeong, Yonghyun},
  journal={arXiv preprint arXiv:2402.17275},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPR 2024] One-Shot Structure-Aware Stylized Image Synthesis

Description

Setup

Prepare Training

Training

Testing

Using Pretrained Models

Acknowledgements

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
P2_weighting		P2_weighting
diffae		diffae
imgs		imgs
imgs_input_domA		imgs_input_domA
imgs_style_domB		imgs_style_domB
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
eval_diffaeB.py		eval_diffaeB.py
gen_style_domA.py		gen_style_domA.py
train_diffaeB.py		train_diffaeB.py

hansam95/OSASIS

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2024] One-Shot Structure-Aware Stylized Image Synthesis

Description

Setup

Prepare Training

Training

Testing

Using Pretrained Models

Acknowledgements

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages