Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please make GPU selectable per model node #1028

Open
tcmaps opened this issue Jul 30, 2023 · 5 comments
Open

Please make GPU selectable per model node #1028

tcmaps opened this issue Jul 30, 2023 · 5 comments
Labels
Feature A new feature to add to ComfyUI.

Comments

@tcmaps
Copy link

tcmaps commented Jul 30, 2023

This would allow to leverage multi GPU setups for own mixture-of-diffusers workflows without having to reload models. You could also split base and refiner of SDXL on two 8GB cards, providing a cheaper upgrade path for some. My tests in this regard were promising:

@tcmaps
Copy link
Author

tcmaps commented Jul 30, 2023

Here I adapted the sdxl diffusers example to use both T4 on kaggle:

from diffusers import DiffusionPipeline
import torch
import time

torch.cuda.reset_peak_memory_stats()

base = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
base.to("cuda:0") # GPU 1

# Do not reuse text encoder and vae of base or it'll throw the "tensors not on same device" error !
refiner = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    #text_encoder_2=base.text_encoder_2,
    #vae=base.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner.to("cuda:1") # GPU 2

start_time = time.time()

prompt = "An astronaut riding a horse"

image = base(
    prompt=prompt,
    num_inference_steps=50,
    denoising_end=0.75,
    output_type="latent",
).images

image = refiner(
    prompt=prompt,
    num_inference_steps=50,
    denoising_start=0.75,
    image=image,
).images[0]

print(f"Time: {(time.time() - start_time):.2f}s")
print(f"VRAM 1: {(torch.cuda.max_memory_allocated(0)/1e9):.2f}GB")
print(f"VRAM 2: {(torch.cuda.max_memory_allocated(1)/1e9):.2f}GB")

Tests resulted in 8+10GB used and about 30s generation time!

Bonus, the tiny 13GB of system RAM on kaggle were sufficient, while using cpu_offload would crash the instance.

@viktor02
Copy link

viktor02 commented Aug 2, 2023

It would be great if someone would implement this

@x4080
Copy link

x4080 commented Sep 22, 2023

@tcmaps I tried using kaggle but it always disconnected in a few seconds using cloudflare, how do you display the GUI ?

@moorehousew
Copy link

Seconding this- I have two RTX 3060 12GB, it would be nice to have the memory of both available to do some heavy workloads. Might tinker some in my free time if I get the chance.

@tarik23
Copy link

tarik23 commented Nov 28, 2023

+1

@robinjhuang robinjhuang added the Feature A new feature to add to ComfyUI. label Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A new feature to add to ComfyUI.
Projects
None yet
Development

No branches or pull requests

6 participants