Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models to Learn Any Unseen Style
This repository is the official implementation of Specialist Diffusion (CVPR 2023)
Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models to Learn Any Unseen Style
Haoming Lu, Hazarapet Tunanyan,
Kai Wang, Shant Navasardyan,
Zhangyang Wang, Humphrey Shi
We present Specialist Diffusion, a style specific personalized text-to-image model. It is plug-and-play to existing diffusion models and other personalization techniques. It outperforms the latest few-shot personalization alternatives of diffusion models such as Textual Inversion and DreamBooth, in terms of learning highly sophisticated styles with ultra-sample-efficient tuning.
First, install prerequisites with:
conda env create -f environment.yml
conda activate sd
Then, set up the configuration for accelerate with:
accelerate config
An example call:
accelerate launch train.py --config='configs/train_default.json'
An example call:
accelerate launch eval.py --config='configs/eval_default.json'
Combination of our model and Textual Inversion. Text prompts used for generation are listed top, styles of the respective datasets are listed under, and the methods for training the models are listed left. By integrating Textual Inversion with our model, the results capture even richer details without losing the style
If you use our work in your research, please cite our publication:
@InProceedings{Lu_2023_CVPR,
author = {Lu, Haoming and Tunanyan, Hazarapet and Wang, Kai and Navasardyan, Shant and Wang, Zhangyang and Shi, Humphrey},
title = {Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To Learn Any Unseen Style},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {14267-14276}
}