The official implementation of the paper:
"FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model"
By Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, Jian Zhang
FreeDoM is a simple but effective training-free method generating results under control from various conditions using unconditional diffusion models. Specifically, we use off-the-shelf pre-trained networks to construct the time-independent energy function, which measures the distance between the given conditions and the intermediately generated images. Then we compute the energy gradient and use it to guide the generation process. FreeDoM supports various conditions, including texts, segmentation maps, sketches, landmarks, face IDs, and style images. FreeDoM applies to different data domains, including human faces, images from ImageNet, and latent codes.
This paper is under review, and we will release the codes and supplementary materials with implementation details after reviewing! You can check the demonstrated results generated by FreeDoM below.
Model Source | Data Domain | Resolution | Original Conditions | Additional Training-free Conditions | Sampling Time*(s/image) |
---|---|---|---|---|---|
SDEdit | aligned human face | None | parsing maps, sketches, landmarks, face IDs, texts | ≈20s | |
guided-diffusion | ImageNet | None | texts, style images | ≈140s | |
guided-diffusion | ImageNet | class label | style images | ≈50s | |
Stable Diffusion | general images |
|
texts | style images | ≈84s |
ControlNet | general images |
|
human poses, scribbles, texts | face IDs, style images | ≈120s |
*The sampling time is tested on a GeForce RTX 3090 GPU card.
Our work is standing on the shoulders of giants. We want to thank the following contributors that our code is based on:
- open-source pre-trained diffusion models:
- (human face models) https://github.com/ermongroup/SDEdit
- (ImageNet mdoels) https://github.com/openai/guided-diffusion
- (Stable Diffusion) https://github.com/CompVis/stable-diffusion
- (ControlNet) https://github.com/lllyasviel/ControlNet
- pre-trained networks for constructing the training-free energy functions:
- (texts, style images) https://github.com/openai/CLIP
- (face parsing maps) https://github.com/zllrunning/face-parsing.PyTorch
- (sketches) https://github.com/Mukosame/Anime2Sketch
- (face landmarks) https://github.com/cunjian/pytorch_face_landmark
- (face IDs) ArcFace(https://arxiv.org/abs/1801.07698)
- time-travel strategy for better sampling:
We also introduce some recent works that shared similar ideas by updating the clean intermediate results
- concurrent conditional image generation methods:
- zero-shot image restoration methods:
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{yu2023freedom,
title={FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model},
author={Yu, Jiwen and Wang, Yinhuai and Zhao, Chen and Ghanem, Bernard and Zhang, Jian},
journal={arXiv:2303.09833},
year={2023}
}