Text to video pipeline #187

stronk-dev · 2024-09-04T11:41:34Z

This PR adds support for a text-to-video pipeline using the THUDM/CogVideoX-2b and THUDM/CogVideoX-5b models

Some notes:

The dl_checkpoints.sh script was modified to download both models, but without any --include "*.fp16.safetensors" argument, as there does not seem to be a variant for them. I currently do force the precision using torch_dtype: FP16 for THUDM/CogVideoX-2b and BF16 for THUDM/CogVideoX-5b
num_frames parameter is currently hardcoded to their recommended value of 49
Included the VAE speedups (enable_slicing, enable_tiling), we might want to play around with these
~~Safety checker?~~ Was added by @ad-astra-video

stronk-dev added 2 commits September 4, 2024 12:18

Text-to-video support rebase WIP

a963c5c

Fixes

01d43a9

This was referenced Sep 4, 2024

[AI] Text to video pipeline support livepeer/go-livepeer#3002

Closed

Text to video pipeline support #51

Closed

Fix response

3c9efe5

stronk-dev mentioned this pull request Sep 4, 2024

[AI] Text to video pipeline support livepeer/go-livepeer#3161

Open

Remove no longer used numpy parsing in Util

ad46b3c

stronk-dev force-pushed the feature/text-to-video branch from e23e726 to ad46b3c Compare September 4, 2024 14:19

Remove some stuff

1c3122b

stronk-dev marked this pull request as ready for review September 4, 2024 14:21

stronk-dev requested a review from rickstaa as a code owner September 4, 2024 14:21

add COGVIDEOX custom device map for 2 GPUs

c1d3eb0

Provide feedback