Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator

Huang, Hanzhuo; Feng, Yufan; Shi, Cheng; Xu, Lan; Yu, Jingyi; Yang, Sibei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.14494 (cs)

[Submitted on 25 Sep 2023]

Title:Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator

Authors:Hanzhuo Huang, Yufan Feng, Cheng Shi, Lan Xu, Jingyi Yu, Sibei Yang

View PDF

Abstract:Text-to-video is a rapidly growing research area that aims to generate a semantic, identical, and temporal coherence sequence of frames that accurately align with the input text prompt. This study focuses on zero-shot text-to-video generation considering the data- and cost-efficient. To generate a semantic-coherent video, exhibiting a rich portrayal of temporal semantics such as the whole process of flower blooming rather than a set of "moving images", we propose a novel Free-Bloom pipeline that harnesses large language models (LLMs) as the director to generate a semantic-coherence prompt sequence, while pre-trained latent diffusion models (LDMs) as the animator to generate the high fidelity frames. Furthermore, to ensure temporal and identical coherence while maintaining semantic coherence, we propose a series of annotative modifications to adapting LDMs in the reverse process, including joint noise sampling, step-aware attention shift, and dual-path interpolation. Without any video data and training requirements, Free-Bloom generates vivid and high-quality videos, awe-inspiring in generating complex scenes with semantic meaningful frame sequences. In addition, Free-Bloom is naturally compatible with LDMs-based extensions.

Comments:	NeurIPS 2023; Project available at: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2309.14494 [cs.CV]
	(or arXiv:2309.14494v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.14494

Submission history

From: Yufan Feng [view email]
[v1] Mon, 25 Sep 2023 19:42:16 UTC (32,182 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators