-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please make GPU selectable per model node #1028
Comments
Here I adapted the sdxl diffusers example to use both T4 on kaggle: from diffusers import DiffusionPipeline
import torch
import time
torch.cuda.reset_peak_memory_stats()
base = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
base.to("cuda:0") # GPU 1
# Do not reuse text encoder and vae of base or it'll throw the "tensors not on same device" error !
refiner = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0",
#text_encoder_2=base.text_encoder_2,
#vae=base.vae,
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
)
refiner.to("cuda:1") # GPU 2
start_time = time.time()
prompt = "An astronaut riding a horse"
image = base(
prompt=prompt,
num_inference_steps=50,
denoising_end=0.75,
output_type="latent",
).images
image = refiner(
prompt=prompt,
num_inference_steps=50,
denoising_start=0.75,
image=image,
).images[0]
print(f"Time: {(time.time() - start_time):.2f}s")
print(f"VRAM 1: {(torch.cuda.max_memory_allocated(0)/1e9):.2f}GB")
print(f"VRAM 2: {(torch.cuda.max_memory_allocated(1)/1e9):.2f}GB") Tests resulted in 8+10GB used and about 30s generation time! Bonus, the tiny 13GB of system RAM on kaggle were sufficient, while using cpu_offload would crash the instance. |
It would be great if someone would implement this |
@tcmaps I tried using kaggle but it always disconnected in a few seconds using cloudflare, how do you display the GUI ? |
Seconding this- I have two RTX 3060 12GB, it would be nice to have the memory of both available to do some heavy workloads. Might tinker some in my free time if I get the chance. |
+1 |
This would allow to leverage multi GPU setups for own mixture-of-diffusers workflows without having to reload models. You could also split base and refiner of SDXL on two 8GB cards, providing a cheaper upgrade path for some. My tests in this regard were promising:
The text was updated successfully, but these errors were encountered: