Diffusion Models Trained with Large Data Are Transferable Visual Models

Xu, Guangkai; Ge, Yongtao; Liu, Mingyu; Fan, Chengxiang; Xie, Kangyang; Zhao, Zhiyue; Chen, Hao; Shen, Chunhua

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.06090v2 (cs)

[Submitted on 10 Mar 2024 (v1), revised 15 Mar 2024 (this version, v2), latest version 24 Oct 2024 (v3)]

Title:Diffusion Models Trained with Large Data Are Transferable Visual Models

Authors:Guangkai Xu, Yongtao Ge, Mingyu Liu, Chengxiang Fan, Kangyang Xie, Zhiyue Zhao, Hao Chen, Chunhua Shen

View PDF HTML (experimental)

Abstract:We show that, simply initializing image understanding models using a pre-trained UNet (or transformer) of diffusion models, it is possible to achieve remarkable transferable performance on fundamental vision perception tasks using a moderate amount of target data (even synthetic data only), including monocular depth, surface normal, image segmentation, matting, human pose estimation, among virtually many others. Previous works have adapted diffusion models for various perception tasks, often reformulating these tasks as generation processes to align with the diffusion process. In sharp contrast, we demonstrate that fine-tuning these models with minimal adjustments can be a more effective alternative, offering the advantages of being embarrassingly simple and significantly faster. As the backbone network of Stable Diffusion models is trained on giant datasets comprising billions of images, we observe very robust generalization capabilities of the diffusion backbone. Experimental results showcase the remarkable transferability of the backbone of diffusion models across diverse tasks and real-world datasets.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.06090 [cs.CV]
	(or arXiv:2403.06090v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.06090

Submission history

From: Chunhua Shen [view email]
[v1] Sun, 10 Mar 2024 04:23:24 UTC (31,238 KB)
[v2] Fri, 15 Mar 2024 04:43:21 UTC (43,150 KB)
[v3] Thu, 24 Oct 2024 07:36:13 UTC (27,225 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Diffusion Models Trained with Large Data Are Transferable Visual Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Diffusion Models Trained with Large Data Are Transferable Visual Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators