CUDA OOM and possible solution -- diffusers cli_demo.py with Nvidia 3090 24GB #92

cktlco · 2024-08-07T16:49:50Z

System Info / 系統信息

Thanks very much for releasing this great work!

In case this helps anyone else:

The diffusers cli_demo.py raised the CUDA OOM error below on an RTX 3090 with 24GB of VRAM using this command:

python cli_demo.py --prompt "A fish swimming underwater through a colorful coral reef. Sun is shining brightly through the water. It is a beautiful scene suitable for use in an eye-catching television advertisement." --model_path THUDM/CogVideoX-2b --num_inference_steps 50

... but works with barely enough free VRAM with this small adjustment -- set the env var PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True prior to running:

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python cli_demo.py --prompt "A fish swimming underwater through a colorful coral reef. Sun is shining brightly through the water. It is a beautiful scene suitable for use in an eye-catching television advertisement." --model_path THUDM/CogVideoX-2b --num_inference_steps 50

CUDA OOM error:

    return torch._C._nn.pad(input, pad, mode, value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.66 GiB. GPU 0 has a total capacity of 23.48 GiB of which 1.53 GiB is free. Including non-PyTorch memory, this process has 21.81 GiB memory in use. Of the allocated memory 18.92 GiB is allocated by PyTorch, and 2.58 GiB is reserved by PyTorch but unallocated.

nvidia-smi before running -- ony 135MiB VRAM used:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:09:00.0 Off |                  N/A |
|  0%   32C    P8             17W /  350W |     135MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

nvidia-smi while running with PYTORCH_CUDA_ALLOC_CONF=exandable_segments:True -- 23GB used but no CUDA OOM.

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:09:00.0 Off |                  N/A |
|  0%   40C    P2            157W /  350W |   23335MiB /  24576MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

This was using a fresh conda environment with dependencies installed by pip from requirements.txt (side note: please include imageio in requirements.txt and also open the version range for opencv-python to >=4.10 as 4.10 was yanked from upstream)

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

Details above.

Expected behavior / 期待表现

Details above.

The text was updated successfully, but these errors were encountered:

cktlco · 2024-08-07T16:55:53Z

The generated video is great, by the way -- keep up the good work!

reef.mp4

tengjiayan20 · 2024-08-07T18:11:59Z

So, have you solve your OOM problem? When we test the demo, it needs 23.9GB in fact. It may be a bit extreme, causing occasional oom.

cktlco · 2024-08-07T18:17:49Z

Yes, setting this environment variable before running solved it for me:

PYTORCH_CUDA_ALLOC_CONF=exandable_segments:True

and I see it using a little over 20GB with that setting with nvidia-smi:

20423MiB /  24576MiB

TNT3530 · 2024-08-08T00:23:32Z

I am getting OOM using 4 32GB GPUs. Using device_map="balanced" seems to split across 3 of the cards, before throwing the OOM error.

zRzRzRzRzRzRzR · 2024-08-08T03:04:45Z

did you use pipe.enable_model_cpu_offload()? if not will use 36GB and will cause this problem
if you want to use multi GPUS just remove pipe.enable_model_cpu_offload()

TNT3530 · 2024-08-08T03:16:00Z

did you use pipe.enable_model_cpu_offload()? if not will use 36GB and will cause this problem if you want to use multi GPUS just remove pipe.enable_model_cpu_offload()

this OOM's

as does this

Both attempts attempt to allocate the ~36GB
This also happens when swapping to sat/inference.sh with the sample .txt

zRzRzRzRzRzRzR · 2024-08-08T03:55:26Z

PYTORCH_CUDA_ALLOC_CONF=exandable_segments:True Try this.
And what is your nvidia driver and GPU? V100 I guess. It should work( Although we only test in 3090 and A100)

TNT3530 · 2024-08-08T03:59:12Z

PYTORCH_CUDA_ALLOC_CONF=exandable_segments:True Try this. And what is your nvidia driver and GPU? V100 I guess. It should work( Although we only test in 3090 and A100)

Also does it with that set for both multi and single attempts
GPUs are AMD Instinct MI100s, ROCm 6.0
I do notice "Torch was not compiled with memory efficient attention..." in the logs, so I'm guessing it may just be an issue with the ROCm variant of torch :(

zRzRzRzRzRzRzR · 2024-08-08T09:36:19Z

you can use torch. 2.2 2.3 2.4 both not work right?

TNT3530 · 2024-08-08T12:55:40Z

you can use torch. 2.2 2.3 2.4 both not work right?

Torch 2.2.2, 2.3.1, and 2.4.0 all fail with the same attempted memory usage of ~36GB

a-r-r-o-w · 2024-08-08T22:24:59Z

Just an FYI: If you install accelerate from the branch in following PR, the Diffusers demo runs in ~18 GB. Context: huggingface/accelerate#2994 (comment)

Code

import gc

import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video


def flush():
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.reset_max_memory_allocated()
    torch.cuda.reset_peak_memory_stats()


def bytes_to_giga_bytes(bytes):
    return f"{(bytes / 1024 / 1024 / 1024):.3f}"


flush()

prompt = (
    "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. "
    "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
    "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
    "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
    "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
    "atmosphere of this unique musical performance."
)

pipe = CogVideoXPipeline.from_pretrained("/raid/aryan/CogVideoX-trial", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]

torch.cuda.empty_cache()
memory = bytes_to_giga_bytes(torch.cuda.memory_allocated())
max_memory = bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
max_reserved = bytes_to_giga_bytes(torch.cuda.max_memory_reserved())
print(f"{memory=}")
print(f"{max_memory=}")
print(f"{max_reserved=}")

export_to_video(video, "output.mp4", fps=8)

The PR will be merged into accelerate main hopefully soon. If you cannot or do not want to, for some reason, use accelerate from the dev branch, you could do the following:

Code without accelerate dev branch requirement

import gc

import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video


def flush():
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.reset_max_memory_allocated()
    torch.cuda.reset_peak_memory_stats()


def bytes_to_giga_bytes(bytes):
    return f"{(bytes / 1024 / 1024 / 1024):.3f}"


flush()

prompt = (
    "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. "
    "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
    "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
    "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
    "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
    "atmosphere of this unique musical performance."
)

pipe = CogVideoXPipeline.from_pretrained("/raid/aryan/CogVideoX-trial", torch_dtype=torch.float16).to("cuda")
latents = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50, output_type="latent", return_dict=False)[0]

pipe.transformer.to("cpu")
pipe.text_encoder.to("cpu")
torch.cuda.synchronize()
torch.cuda.empty_cache()

with torch.no_grad():
    video = pipe.decode_latents(latents, num_seconds=6)
    video = pipe.video_processor.postprocess_video(video=video, output_type="pil")

torch.cuda.empty_cache()
memory = bytes_to_giga_bytes(torch.cuda.memory_allocated())
max_memory = bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
max_reserved = bytes_to_giga_bytes(torch.cuda.max_memory_reserved())
print(f"{memory=}")
print(f"{max_memory=}")
print(f"{max_reserved=}")

export_to_video(video, "output.mp4", fps=8)

Additionally, you can play around with the device_map parameter if you have multiple GPUs, quantize text encoder or full transformer. Denoising only requires about 12-14 GB memory (if using cpu offloading) but it's the VAE that takes the most amount of memory (1 GB model + 17 GB decoding). We are working on figuring out tiled-decoding but nothing promising yet. I would imagine that CogVideoX would be runnable on a free-tier T4 or lower if someone can figure out tiling so if anyone's got ideas, feel free to PR at Diffusers

TNT3530 · 2024-08-08T22:45:57Z

Just an FYI: If you install accelerate from the branch in following PR, the Diffusers demo runs in ~18 GB. Context: huggingface/accelerate#2994 (comment)
Code

The PR will be merged into accelerate main hopefully soon. If you cannot or do not want to, for some reason, use accelerate from the dev branch, you could do the following:
Code without accelerate dev branch requirement

Additionally, you can play around with the device_map parameter if you have multiple GPUs, quantize text encoder or full transformer. Denoising only requires about 12-14 GB memory (if using cpu offloading) but it's the VAE that takes the most amount of memory (1 GB model + 17 GB decoding). We are working on figuring out tiled-decoding but nothing promising yet. I would imagine that CogVideoX would be runnable on a free-tier T4 or lower if someone can figure out tiling so if anyone's got ideas, feel free to PR at Diffusers

Running your provided non-dev-branch code and only swapping out the model for the HF THUDM/CogVideoX-2b still OOMs with the same numbers.
Adding PYTORCH_NO_MEMORY_CACHING=1 and pipe.enable_model_cpu_offload() also OOMs
Installing from source after cloning huggingface/accelerate and checking out test-clear-memory-cpu-offload, running pip install . and executing, also OOMs
Doing all of the above combined but adding device_map="balanced" also OOMs

a-r-r-o-w · 2024-08-08T23:19:11Z

I am perfectly able to run the 2nd example above on an A4500 (20GB) with a fresh Pytorch 2.3 install and diffusers:main. We'll have to try and debug what's going wrong in your setup.

You mention:

I do notice "Torch was not compiled with memory efficient attention..." in the logs, so I'm guessing it may just be an issue with the ROCm variant of torch :(

Can you paste the error stack trace here? Would like to know at what point it's failing. If it's failing somewhere in attention, it's probably because you're unable to use FA2 - which is necessary to be able to run with low memory. Can you try setting up pytroch so that it allows you to run FA2?

TNT3530 · 2024-08-09T00:44:57Z

I am perfectly able to run the 2nd example above on an A4500 (20GB) with a fresh Pytorch 2.3 install and diffusers:main. We'll have to try and debug what's going wrong in your setup.

You mention:

I do notice "Torch was not compiled with memory efficient attention..." in the logs, so I'm guessing it may just be an issue with the ROCm variant of torch :(

Can you paste the error stack trace here? Would like to know at what point it's failing. If it's failing somewhere in attention, it's probably because you're unable to use FA2 - which is necessary to be able to run with low memory. Can you try setting up pytroch so that it allows you to run FA2?

Installed flash_attn 2.0.4 built from source, not sure how to force torch to let me use it

Output

/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/torch/cuda/memory.py:343: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
  warnings.warn(
The config attributes {'mid_block_add_attention': True} were passed to AutoencoderKLCogVideoX, but are not expected and will be ignored. Please verify your config.json configuration file.

Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]The config attributes {'mid_block_add_attention': True} were passed to AutoencoderKLCogVideoX, but are not expected and will be ignored. Please verify your config.json configuration file.

Loading pipeline components...:  20%|██        | 1/5 [00:00<00:03,  1.03it/s]
Loading pipeline components...:  60%|██████    | 3/5 [00:01<00:00,  3.24it/s]
Loading pipeline components...:  80%|████████  | 4/5 [00:06<00:01,  1.98s/it]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]�[A

Loading checkpoint shards:  50%|█████     | 1/2 [00:05<00:05,  5.55s/it]�[A

Loading checkpoint shards: 100%|██████████| 2/2 [00:10<00:00,  5.17s/it]�[A
Loading checkpoint shards: 100%|██████████| 2/2 [00:10<00:00,  5.23s/it]

Loading pipeline components...: 100%|██████████| 5/5 [00:16<00:00,  4.82s/it]
Loading pipeline components...: 100%|██████████| 5/5 [00:16<00:00,  3.36s/it]

  0%|          | 0/50 [00:00<?, ?it/s]
  0%|          | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/tnt3530/Documents/CogVideo/provided.py", line 28, in <module>
    latents = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50, output_type="latent", return_dict=False)[0]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox.py", line 629, in __call__
    noise_pred = self.transformer(
                 ^^^^^^^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 326, in forward
    hidden_states, encoder_hidden_states = block(
                                           ^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 123, in forward
    attn_output = self.attn1(
                  ^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/diffusers/models/attention_processor.py", line 490, in forward
    return self.processor(
           ^^^^^^^^^^^^^^^
  File "/home/tnt3530/anaconda3/envs/cog/lib/python3.12/site-packages/diffusers/models/attention_processor.py", line 2216, in __call__
    hidden_states = F.scaled_dot_product_attention(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: HIP out of memory. Tried to allocate 35.31 GiB. GPU 1 has a total capacity of 31.98 GiB of which 24.15 GiB is free. Of the allocated memory 4.58 GiB is allocated by PyTorch, and 481.62 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_HIP_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

EDIT: SDPA is disabled on AMD cards because they decided that only cool kids can play with fun toys
pytorch/pytorch#112997

TNT3530 · 2024-08-09T01:40:28Z

Updating to torch 2.5.0 nightly makes it attempt to allocate 70.63GB (35 per gpu now). Forcefully disabling SDPA via torch.backends.cuda.enable_flash_sdp(False) doesnt help either

a-r-r-o-w · 2024-08-09T14:49:12Z

Yeah quite unfortunate that AMD doesn't support SDPA :( From the error logs, it seems like that's the only bottleneck that's making it not possible to run Cog for you. Let me know if you find any issues on the Diffusers side-of-things that I can help with

zhanghang1995 · 2024-08-11T15:52:22Z

Thank you for your great work. I has question, It can well be inferenced by multi-gpus(RTX 3060Ti) or must one or more gpus >= 20GB? Thank you

zRzRzRzRzRzRzR · 2024-08-11T15:54:31Z

must one or more gpus >= 20GB beacuse of VAE.

zhanghang1995 · 2024-08-12T00:42:58Z

must one or more gpus >= 20GB beacuse of VAE.

Thank you, Does have any methods to splite VAE module?

zRzRzRzRzRzRzR · 2024-08-12T08:06:27Z

Not now, we will try tied vae, we tested balance in 3 GPU in 20GB. you can try if you can run at 3 * 16G GPU

zhanghang1995 · 2024-08-14T11:38:47Z

Not now, we will try tied vae, we tested balance in 3 GPU in 20GB. you can try if you can run at 3 * 16G GPU
Thank you, If has any processing in tied vae, pls inform.

zRzRzRzRzRzRzR · 2024-08-14T12:40:59Z

Not now, we will try tied vae, we tested balance in 3 GPU in 20GB. you can try if you can run at 3 * 16G GPU
Thank you, If has any processing in tied vae, pls inform.

try install diffusers and accelerate libs from source and check cli demo in CogVideoX-dev branch now, it will cost 12GB only for infer

QAQEthan · 2024-08-15T13:36:33Z

did you use pipe.enable_model_cpu_offload()? if not will use 36GB and will cause this problem if you want to use multi GPUS just remove pipe.enable_model_cpu_offload()

Hi，Why do I still need ~36GB of GPU mem even though I set pipe.enable_model_cpu_offload()?

zRzRzRzRzRzRzR · 2024-08-17T03:10:32Z

did you follow with cli_demo.py code and using NVIDIA Ampere or higher GPU like 3090 4090

QAQEthan · 2024-08-23T11:46:33Z

did you follow with cli_demo.py code and using NVIDIA Ampere or higher GPU like 3090 4090

I use nvidia A100 to run. and demo code is from huggingface. https://huggingface.co/THUDM/CogVideoX-2b

zRzRzRzRzRzRzR · 2024-08-23T12:14:21Z

you can try reinstall the diffusers and accelerate libs from source, and A100 must work with using infersence/cli_demo.py in this github repos

zRzRzRzRzRzRzR · 2024-08-28T13:08:17Z

We have updated the latest repository, dependencies can now be downloaded from pip. Updating dependencies and retrying cli_demo can solve the problem. If there are any new issues, we can open a new issue

tengjiayan20 self-assigned this Aug 7, 2024

zRzRzRzRzRzRzR closed this as completed Aug 28, 2024

jin-eld mentioned this issue Sep 6, 2024

Question: How to run on AMD GPUs? How to enable multi GPU use? #223

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA OOM and possible solution -- diffusers cli_demo.py with Nvidia 3090 24GB #92

CUDA OOM and possible solution -- diffusers cli_demo.py with Nvidia 3090 24GB #92

cktlco commented Aug 7, 2024 •

edited

Loading

cktlco commented Aug 7, 2024

tengjiayan20 commented Aug 7, 2024

cktlco commented Aug 7, 2024 •

edited

Loading

TNT3530 commented Aug 8, 2024

zRzRzRzRzRzRzR commented Aug 8, 2024

TNT3530 commented Aug 8, 2024 •

edited

Loading

zRzRzRzRzRzRzR commented Aug 8, 2024

TNT3530 commented Aug 8, 2024 •

edited

Loading

zRzRzRzRzRzRzR commented Aug 8, 2024

TNT3530 commented Aug 8, 2024

a-r-r-o-w commented Aug 8, 2024 •

edited

Loading

TNT3530 commented Aug 8, 2024 •

edited

Loading

a-r-r-o-w commented Aug 8, 2024

TNT3530 commented Aug 9, 2024 •

edited

Loading

TNT3530 commented Aug 9, 2024 •

edited

Loading

a-r-r-o-w commented Aug 9, 2024

zhanghang1995 commented Aug 11, 2024

zRzRzRzRzRzRzR commented Aug 11, 2024

zhanghang1995 commented Aug 12, 2024

zRzRzRzRzRzRzR commented Aug 12, 2024

zhanghang1995 commented Aug 14, 2024

zRzRzRzRzRzRzR commented Aug 14, 2024 •

edited

Loading

QAQEthan commented Aug 15, 2024

zRzRzRzRzRzRzR commented Aug 17, 2024

QAQEthan commented Aug 23, 2024

zRzRzRzRzRzRzR commented Aug 23, 2024

zRzRzRzRzRzRzR commented Aug 28, 2024

CUDA OOM and possible solution -- diffusers cli_demo.py with Nvidia 3090 24GB #92

CUDA OOM and possible solution -- diffusers cli_demo.py with Nvidia 3090 24GB #92

Comments

cktlco commented Aug 7, 2024 • edited Loading

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

cktlco commented Aug 7, 2024

tengjiayan20 commented Aug 7, 2024

cktlco commented Aug 7, 2024 • edited Loading

TNT3530 commented Aug 8, 2024

zRzRzRzRzRzRzR commented Aug 8, 2024

TNT3530 commented Aug 8, 2024 • edited Loading

zRzRzRzRzRzRzR commented Aug 8, 2024

TNT3530 commented Aug 8, 2024 • edited Loading

zRzRzRzRzRzRzR commented Aug 8, 2024

TNT3530 commented Aug 8, 2024

a-r-r-o-w commented Aug 8, 2024 • edited Loading

TNT3530 commented Aug 8, 2024 • edited Loading

a-r-r-o-w commented Aug 8, 2024

TNT3530 commented Aug 9, 2024 • edited Loading

TNT3530 commented Aug 9, 2024 • edited Loading

a-r-r-o-w commented Aug 9, 2024

zhanghang1995 commented Aug 11, 2024

zRzRzRzRzRzRzR commented Aug 11, 2024

zhanghang1995 commented Aug 12, 2024

zRzRzRzRzRzRzR commented Aug 12, 2024

zhanghang1995 commented Aug 14, 2024

zRzRzRzRzRzRzR commented Aug 14, 2024 • edited Loading

QAQEthan commented Aug 15, 2024

zRzRzRzRzRzRzR commented Aug 17, 2024

QAQEthan commented Aug 23, 2024

zRzRzRzRzRzRzR commented Aug 23, 2024

zRzRzRzRzRzRzR commented Aug 28, 2024

cktlco commented Aug 7, 2024 •

edited

Loading

cktlco commented Aug 7, 2024 •

edited

Loading

TNT3530 commented Aug 8, 2024 •

edited

Loading

TNT3530 commented Aug 8, 2024 •

edited

Loading

a-r-r-o-w commented Aug 8, 2024 •

edited

Loading

TNT3530 commented Aug 8, 2024 •

edited

Loading

TNT3530 commented Aug 9, 2024 •

edited

Loading

TNT3530 commented Aug 9, 2024 •

edited

Loading

zRzRzRzRzRzRzR commented Aug 14, 2024 •

edited

Loading