Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to run on M1Mac with automatic1111 web ui #18

Open
leopold-liu opened this issue Apr 3, 2023 · 15 comments
Open

Failed to run on M1Mac with automatic1111 web ui #18

leopold-liu opened this issue Apr 3, 2023 · 15 comments

Comments

@leopold-liu
Copy link

Hi there,
I would like to use ToMe to speed up diffusion, but i got an error on My M1Mac with automatic1111 web ui, could u pls help with this:

Traceback (most recent call last):
File "/Users/leopold/code/stable-diffusion-webui/modules/call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "/Users/leopold/code/stable-diffusion-webui/modules/call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "/Users/leopold/code/stable-diffusion-webui/modules/txt2img.py", line 56, in txt2img
processed = process_images(p)
File "/Users/leopold/code/stable-diffusion-webui/modules/processing.py", line 486, in process_images
res = process_images_inner(p)
File "/Users/leopold/code/stable-diffusion-webui/modules/processing.py", line 636, in process_images_inner
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
File "/Users/leopold/code/stable-diffusion-webui/modules/processing.py", line 852, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "/Users/leopold/code/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 351, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "/Users/leopold/code/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 227, in launch_sampling
return func()
File "/Users/leopold/code/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 351, in
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "/Users/leopold/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/leopold/code/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/sampling.py", line 145, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "/Users/leopold/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/leopold/code/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 119, in forward
x_out = self.inner_model(x_in, sigma_in, cond={"c_crossattn": [cond_in], "c_concat": [image_cond_in]})
File "/Users/leopold/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/leopold/code/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 114, in forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File "/Users/leopold/code/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 140, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File "/Users/leopold/code/stable-diffusion-webui/modules/sd_hijack_utils.py", line 17, in
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
File "/Users/leopold/code/stable-diffusion-webui/modules/sd_hijack_utils.py", line 26, in call
return self.__sub_func(self.__orig_func, *args, **kwargs)
File "/Users/leopold/code/stable-diffusion-webui/modules/sd_hijack_unet.py", line 45, in apply_model
return orig_func(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs).float()
File "/Users/leopold/code/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "/Users/leopold/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/leopold/code/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1329, in forward
out = self.diffusion_model(x, t, context=cc)
File "/Users/leopold/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
result = forward_call(*input, **kwargs)
File "/Users/leopold/code/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 776, in forward
h = module(h, emb, context)
File "/Users/leopold/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/leopold/code/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 84, in forward
x = layer(x, context)
File "/Users/leopold/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/leopold/code/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 324, in forward
x = block(x, context=context[i])
File "/Users/leopold/code/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/leopold/code/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 259, in forward
return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
File "/Users/leopold/code/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 114, in checkpoint
return CheckpointFunction.apply(func, len(inputs), *args)
File "/Users/leopold/code/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 129, in forward
output_tensors = ctx.run_function(*ctx.input_tensors)
File "/Users/leopold/code/stable-diffusion-webui/tomesd/tomesd/patch.py", line 48, in _forward
m_a, m_c, m_m, u_a, u_c, u_m = compute_merge(x, self.tome_info)
File "/Users/leopold/code/stable-diffusion-webui/tomesd/tomesd/patch.py", line 21, in compute_merge
m, u = merge.bipartite_soft_matching_random2d(x, w, h, args["sx"], args["sy"], r, not args["use_rand"])
File "/Users/leopold/code/stable-diffusion-webui/tomesd/tomesd/merge.py", line 55, in bipartite_soft_matching_random2d
idx_buffer_view.scatter
(dim=2, index=rand_idx, src=-torch.ones_like(rand_idx, dtype=rand_idx.dtype))
TypeError: Operation 'neg_out_mps()' does not support input type 'int64' in MPS backend.

@dbolya
Copy link
Owner

dbolya commented Apr 3, 2023

Hmm if you change all mentions of int64 to int32 in merge.py and reinstall, does it work?

@leopold-liu
Copy link
Author

Hmm if you change all mentions of int64 to int32 in merge.py and reinstall, does it work?

nope, still the same issue

@dbolya
Copy link
Owner

dbolya commented Apr 3, 2023

Maybe you need to put export PYTORCH_ENABLE_MPS_FALLBACK=1 in the webui launch script (see #15).

@Awethon
Copy link

Awethon commented Apr 3, 2023

I also have issues with running it in automatic webui.

/AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayGatherND.mm:234: failed assertion `Rank of updates array (1) must be greater than or equal to inner-most dimension of indices array (2867)'

PYTORCH_ENABLE_MPS_FALLBACK didn't help.

pytorch 2.0

@dbolya
Copy link
Owner

dbolya commented Apr 3, 2023

@Awethon are you on the latest dev build? (you have to install from source) That error was fixed already, but I haven't pushed it to pip yet.

@leopold-liu
Copy link
Author

Maybe you need to put export PYTORCH_ENABLE_MPS_FALLBACK=1 in the webui launch script (see #15).

yes, i've seen that issue too, and i put 'export PYTORCH_ENABLE_MPS_FALLBACK=1' in launch script already but still the same error, seems like a different problem

@dbolya
Copy link
Owner

dbolya commented Apr 4, 2023

Does ToMe work for you outside of the webui (for instance, in diffusers)?
The error you got originally seems to me like MPS doesn't support negating an int, which is quite weird.

@leopold-liu
Copy link
Author

I will test about it

@ChrisYangTW
Copy link

ChrisYangTW commented Apr 5, 2023

The same problem occurs at the beginning, such as 'neg_out_mps()' error or "failed assertion '".
Finally, after following everyone's advice, I have succeeded!.

conditions:

  1. tomesd==0.1.2
    (venv) pip3 install tomesd==0.1.2

  2. torch==2.0.0, torchvision==0.15.1, and torchaudio==2.0.1(not sure if it is necessary)
    (venv) pip3 install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1

  3. environment variables: PYTORCH_ENABLE_MPS_FALLBACK=1
    export PYTORCH_ENABLE_MPS_FALLBACK=1

Recommended use the extension: https://git.mmaker.moe/mmaker/sd-webui-tome to setup parameters

@jrittvo
Copy link

jrittvo commented Apr 5, 2023

FWIW, it also runs fine with torch 2.1.0.dev and torchvision 0.16.0.dev using diffusers 0.14.0. Haven't tried updating Auto1111 to the dev packages . . .

@OrganicBeej
Copy link

#15

Hi there,

I had exactly the same issue on my iMac as the first post. I am wondering if you cld explain from your post above exactly what you did? I tried adding export PYTORCH_ENABLE_MPS_FALLBACK=1 to my webui.sh but I don't think I did it correctly?

Also, the other 2 bits of code with (venv), are you running these in a terminal from venv directory?

Sorry for questions, just a little unsure with this type of thing but wld love to see ToMe working on my iMac M1 and it does seem like it can.
Thnks for any help :)

@recoilme
Copy link

For me this script working on m1

import torch, tomesd
from diffusers import StableDiffusionPipeline

model_id = "./colorful_v30.ckpt"
pipe = StableDiffusionPipeline.from_ckpt(model_id, torch_dtype=torch.float32).to("cpu")

# Apply ToMe with a 50% merging ratio
tomesd.apply_patch(pipe, ratio=0.5)
#pipe.save_pretrained("1")
image = pipe(
   "a photo of an astronaut riding a horse on mars",
   width=512, height=512, num_inference_steps=10).images[0]
image.save("astronaut.png")

i upd diffusers and switch on latest torch2
pip install git+https://github.com/huggingface/diffusers.git@main

But it not work on 'mps' (torch.float32).to("mps")) and from AUTO1111
In case of mps backend error:

tomesd/merge.py:55
TypeError: Operation 'neg_out_mps()' does not support input type 'int64' in MPS backend.

It's hard to understand is it faster or not from script (first image generation time bug on mac), but probably faster

@recoilme
Copy link

So, i have 2 news) Good and Bad.
How run on mac m1:

  • replace in merge.py int64 on int32
  • add in random init type int32
  • get error NotImplementedError: The operator 'aten::sort.values_stable' is not currently implemented for the MPS device - export PYTORCH_ENABLE_MPS_FALLBACK=1

I use this plugin https://git.mmaker.moe/mmaker/sd-webui-tome with 0.5 (Enable in settings / tokens transformer). Apply/Reload UI/Reload model

Output without ToMe:

Removing ToMe patch (if exists)
Running on local URL:  https://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 1.0s (load scripts: 0.3s, create ui: 0.6s).
100%|███████████████████████████████████████████| 36/36 [03:03<00:00,  5.11s/it]

Output with ToMe:

Applying ToMe patch...
ToMe patch applied
Weights loaded in 2.7s (load weights from disk: 0.2s, apply weights to model: 1.9s, move model to device: 0.6s).
100%|███████████████████████████████████████████| 36/36 [02:10<00:00,  3.62s/it]

Prompt:

Create a photorealistic image of a warlord wizard casting a spell. Utilize state-of-the-art techniques, including HDR, CGI, VFX, and insane levels of detail to create an ultra-sharp and ultra-realistic image. Use Unreal 5 and Octane Render to bring the scene to life, with a focus on creating an intricate masterpiece that showcases the wizard's power and magical prowess.
Negative prompt: frame, blurry, drawing, sketch, ((ugly)), ((duplicate)), (morbid), ((mutilated)), (mutated), (deformed), (disfigured), (extra limbs), (malformed limbs), (missing arms), (missing legs), (extra arms), (extra legs), (fused fingers), (too many fingers), long neck, low quality, worst quality, 3d, cartoon, anime, girl, loli, young, monochrome
Steps: 36, Sampler: DPM++ 2M Test, CFG scale: 6, Seed: 3370768663, Size: 640x896, Model hash: 1a36578807, Model: colorful_v30, Hashes: {"model": "1a36578807"}

29% speedup for 0.5 loss!
18% speedup for 0.3 loss (same size: 640x896)

But many little details was washed out(
Original image
00104-3370768663

@recoilme
Copy link

"Patched with 0.5/0.3"
00103-3370768663
00105-3370768663

@recoilme
Copy link

Anyway it's huge improvements in speed on middle sizes. Good for testing, on mobile and so on.
I don't send PR because i don't know Python, i just want to generate waifu. I packed to gist - but this code smell https://gist.github.com/recoilme/c04db1d9a83358c6cdbadf18026df048

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants