Please make GPU selectable per model node #1028

tcmaps · 2023-07-30T13:00:06Z

This would allow to leverage multi GPU setups for own mixture-of-diffusers workflows without having to reload models. You could also split base and refiner of SDXL on two 8GB cards, providing a cheaper upgrade path for some. My tests in this regard were promising:

tcmaps · 2023-07-30T13:07:07Z

Here I adapted the sdxl diffusers example to use both T4 on kaggle:

from diffusers import DiffusionPipeline
import torch
import time

torch.cuda.reset_peak_memory_stats()

base = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
base.to("cuda:0") # GPU 1

# Do not reuse text encoder and vae of base or it'll throw the "tensors not on same device" error !
refiner = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    #text_encoder_2=base.text_encoder_2,
    #vae=base.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner.to("cuda:1") # GPU 2

start_time = time.time()

prompt = "An astronaut riding a horse"

image = base(
    prompt=prompt,
    num_inference_steps=50,
    denoising_end=0.75,
    output_type="latent",
).images

image = refiner(
    prompt=prompt,
    num_inference_steps=50,
    denoising_start=0.75,
    image=image,
).images[0]

print(f"Time: {(time.time() - start_time):.2f}s")
print(f"VRAM 1: {(torch.cuda.max_memory_allocated(0)/1e9):.2f}GB")
print(f"VRAM 2: {(torch.cuda.max_memory_allocated(1)/1e9):.2f}GB")

Tests resulted in 8+10GB used and about 30s generation time!

Bonus, the tiny 13GB of system RAM on kaggle were sufficient, while using cpu_offload would crash the instance.

viktor02 · 2023-08-02T03:36:21Z

It would be great if someone would implement this

x4080 · 2023-09-22T22:37:26Z

@tcmaps I tried using kaggle but it always disconnected in a few seconds using cloudflare, how do you display the GUI ?

moorehousew · 2023-11-12T22:46:49Z

Seconding this- I have two RTX 3060 12GB, it would be nice to have the memory of both available to do some heavy workloads. Might tinker some in my free time if I get the chance.

tarik23 · 2023-11-28T15:46:03Z

+1

robinjhuang added the Feature A new feature to add to ComfyUI. label Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please make GPU selectable per model node #1028

Please make GPU selectable per model node #1028

tcmaps commented Jul 30, 2023

tcmaps commented Jul 30, 2023

viktor02 commented Aug 2, 2023

x4080 commented Sep 22, 2023

moorehousew commented Nov 12, 2023

tarik23 commented Nov 28, 2023

Please make GPU selectable per model node #1028

Please make GPU selectable per model node #1028

Comments

tcmaps commented Jul 30, 2023

tcmaps commented Jul 30, 2023

viktor02 commented Aug 2, 2023

x4080 commented Sep 22, 2023

moorehousew commented Nov 12, 2023

tarik23 commented Nov 28, 2023