Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General MPS op coverage tracking issue #77764

Open
albanD opened this issue May 18, 2022 · 1,376 comments
Open

General MPS op coverage tracking issue #77764

albanD opened this issue May 18, 2022 · 1,376 comments
Labels
feature A request for a proper, new feature. module: mps Related to Apple Metal Performance Shaders framework tracker A tracking issue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@albanD
Copy link
Collaborator

albanD commented May 18, 2022

This issue is to have a centralized place to list and track work on adding support to new ops for the MPS backend.

PyTorch MPS Ops Project : Project to track all the ops for MPS backend. There are a very large number of operators in pytorch and so they are not all yet implemented. We will be prioritizing adding new operators based on user feedback. If possible, please also provide link to the network or use-case where this op is getting used.

As Ops are requested we will add " To Triage" pool. If we have 3+ requests for an operation and given its complexity/need the operation will be moved "To be implemented" pool. If you want to work on adding support for such op, feel free to comment below to get assigned one. Please avoid pickup up an op that is already being worked on tracked in "In progress" pool.

Link to the wiki for details on how to add these ops and example PRs.

MPS operators coverage matrix - The matrix covers most of the supported operators but is not exhaustive. Please look at the In vx.x.x column, if the box is green, it means that the op implementation is included in the latest release; on the other hand, if the box is yellow, it means the op implementation is in the nightly and has not yet included in the latest release. Before you comment below, please take a look at this matrix to make sure the operator you're requesting has not been implemented in nightly. More details can be found on the readme.

cc @kulinseth @malfet @DenisVieriu97 @jhavukainen

@albanD albanD added feature A request for a proper, new feature. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: mps Related to Apple Metal Performance Shaders framework labels May 18, 2022
@albanD albanD changed the title General MPS op coverage issue General MPS op coverage tracking issue May 18, 2022
@philipturner
Copy link

Are there any linear algebra ops not implemented in MPS that you have made custom shaders for? Any shaders I could "borrow" from your project (with full credit) and use in my own? Specifically, it would be helpful to have SVD and reverse-mode Cholesky operators.

@albanD
Copy link
Collaborator Author

albanD commented May 18, 2022

Hey,

There are no custom shaders at the moment as everything we needed for the basic networks we looked at was already provided by MPS (or a set of ops in MPS). Also , required functions that are not in the hot path are simply falling back to CPU for now.

It is mentioned here as this is something that is possible to be done easily within the integration. But not something that is used today.

@pzelasko
Copy link

I was testing a bunch of speech synthesis and vocoder models, and found the following operators missing so far:

  • aten::flip
  • aten::equal
  • aten::upsample_nearest1d.out

@Linux-cpp-lisp
Copy link

One vote for a CPU fallback for torch.bincount.

Is there any reason, given the unified memory architecture, that every op not implemented on Metal cannot just fall back to the CPU implementation without memory copy operations? (Based, of course, on my 10,000ft view of the architecture, which I'm sure is wildly oversimplified.)

@richardburleigh
Copy link

richardburleigh commented May 19, 2022

Tip for everyone:

Run your script with PYTORCH_ENABLE_MPS_FALLBACK=1 which will fallback to the CPU.

I'm using a custom build which merges pull request #77791 so am not sure if this is included in the current build (Edit: It's not. You need to build PyTorch yourself with the pull request or trust an online build with it).

@gautierdag
Copy link

Testing with some huggingface transformers code: + 1 vote for aten::cumsum.out
Tried with the fallback env var but doesn't seem to work for me.

@lhoenig
Copy link
Contributor

lhoenig commented May 20, 2022

One missing op I ran into and haven't seen mentioned yet is aten::_unique2.
Edit: This error goes away when passing PYTORCH_ENABLE_MPS_FALLBACK=1 when using the current main branch build. However, instead I get warnings

The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/lukas/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)

then

The dst MTL buffer in copy_to_mps is non-contiguous (Triggered internally at  /Users/lukas/pytorch/aten/src/ATen/native/mps/operations/Copy.mm:323.)

and finally the forward pass through my model crashes with

RuntimeError: Placeholder buffer size (7493632) is not large enough to contain the Tensor storage of size 14986944

On cpu it works fine. Could be #77886 I suppose.

@Willian-Zhang
Copy link

Testing with some huggingface transformers code: + 1 vote for aten::cumsum.out
Tried with the fallback env var but doesn't seem to work for me.

+1
setting PYTORCH_ENABLE_MPS_FALLBACK=1 still results in:

NotImplementedError: Could not run 'aten::cumsum.out' with arguments from the 'MPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::cumsum.out' is only available for these backends: [Dense, Conjugate, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:37386 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:31637 [kernel]
BackendSelect: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:133 [backend fallback]
Named: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
Conjugate: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp:3288 [kernel]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_0.cpp:12585 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterFunctionalization_3.cpp:12118 [kernel]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:137 [backend fallback]

@albanD
Copy link
Collaborator Author

albanD commented May 20, 2022

@lhoenig could you open a new separate issue for the cpu fallback failing for you?
The error seems to hint at the fact that you're doing moving across device non-contiguous Tensor. Making sure they are might help as a workaround.
We can continue this discussion in the new issue you will create.

@Willian-Zhang the fallback is ONLY available if you build from source right now. It will be in the nightly build tomorrow (May 21st).

@weiji14
Copy link
Contributor

weiji14 commented May 20, 2022

Would like to add aten::_local_scalar_dense to the list. Also, is it possible to link to some examples in the top post on how we can implement these into Pytorch? I'd love to give it a shot if it's not too hard.

@lhoenig
Copy link
Contributor

lhoenig commented May 20, 2022

@albanD Yep, making the Tensors contiguous worked. But yet another issue revealed itself. I created #77977 and #78001.

@psobolewskiPhD
Copy link

psobolewskiPhD commented May 20, 2022

I've got a non supported op: aten::grid_sampler_2d

envs/pytorch-env/lib/python3.9/site-packages/torch/nn/functional.py:4172: UserWarning: The operator 'aten::grid_sampler_2d' is not currently supported on the MPS backend and will fall back to run on the CPU. This may performance implications. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners)

@thipokKub
Copy link

Not supported

  • aten::l1_loss_backward.grad_input
  • aten::kl_div_backward

Code

X, y = torch.rand(16, 10).to("mps"), torch.rand(16, 1).to("mps")
model = nn.Linear(10, 1).to("mps")
criterion = nn.L1Loss() # nn.KLDivLoss()
loss = criterion(model(X), y)
loss.backward()

Output

NotImplementedError: Could not run 'aten::l1_loss_backward.grad_input' with arguments from the 'MPS' backend

@tw-ilson
Copy link

Trying to use affine crop from torchvision, and found the operator aten::linspace.out does not seem to be implemented with the MPS backend

@nicolasbeglinger
Copy link

nicolasbeglinger commented May 22, 2022

Trying to use MPS backend with pytorch geometric, and found the operator aten::index.Tensor is not yet implemented.

@feesta
Copy link

feesta commented May 22, 2022

Found the operator 'aten::grid_sampler_2d' is not current implemented for the MPS device.

@mooey5775
Copy link

Would be great to add aten::adaptive_max_pool2d to the list - seems to be fairly common and for me useful in some point cloud architectures.

@RohanM
Copy link
Contributor

RohanM commented May 23, 2022

I ran into this error with aten::count_nonzero.dim_IntList (via torch.count_nonzero()). I'll take a look at implementing this op with MPS.

@dbl001
Copy link

dbl001 commented Jul 10, 2024

In project fourier neuraloperator.
Go to project https://github.com/neuraloperator/neuraloperator/tree/master
fourier_neural_operator/fourier_3d.py
Change to device='mps'.
The error is likely in

  def compl_mul3d(self, input, weights):
        # (batch, in_channel, x,y,t ), (in_channel, out_channel, x,y,t) -> (batch, out_channel, x,y,t)
        return torch.einsum("bixyz,ioxyz->boxyz", input, weights)

Here is the error message:

(mpsFileLoc): /AppleInternal/Library/BuildRoots/91a344b1-f985-11ee-b563-fe8bc7981bff/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:137:0: error: 'mps.gather' op operand #0 must be tensor of mps native type values, but got 'tensor<19660800xcomplex<f32>>'
(mpsFileLoc): /AppleInternal/Library/BuildRoots/91a344b1-f985-11ee-b563-fe8bc7981bff/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:137:0: note: see current operation: %38 = "mps.gather"(%34, %36, %37) <{batch_dims = 0 : ui32}> : (tensor<19660800xcomplex<f32>>, tensor<102400xsi32>, tensor<si32>) -> tensor<102400xcomplex<f32>>
/AppleInternal/Library/BuildRoots/91a344b1-f985-11ee-b563-fe8bc7981bff/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Headers/Project/MPSGraphExecutable_Project.h:142: failed assertion `Error: executable initialization failed.'

Could PyTorch default to the CPU instead of aborting in the Metal Performance Layers?

@Lakshmanaraja
Copy link

Lakshmanaraja commented Jul 11, 2024

The operator 'torchvision::nms' is not currently implemented for the MPS device for FasterRCNN - M2 device

@HamishGBrown
Copy link

HamishGBrown commented Jul 11, 2024

Please implement complex numbers NotImplementedError: The operator 'aten::complex.out' is not current implemented for the MPS device. 😃

@JianmingXia
Copy link

NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device

@adaitch
Copy link

adaitch commented Jul 11, 2024

The operator 'aten::foreach_mul.Scalar' is not currently implemented for the MPS device

@rohit5895
Copy link

NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device.

@eliasprost
Copy link

eliasprost commented Jul 11, 2024

One vote for aten::isin.Tensor_Tensor_out In Mackbook M1 pro 2021 🙏
Complete error message:

NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

@karthikm20
Copy link

Operator aten::foreach_mul.Scalar is not implemented In Mackbook M1 pro 2021

Complete error message:
NotImplementedError: The operator 'aten::foreach_mul.Scalar' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

@etcola
Copy link

etcola commented Jul 12, 2024

The operator 'aten::upsample_bicubic2d.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

thx buddy

@qli7777
Copy link

qli7777 commented Jul 12, 2024

Complete error message:
NotImplementedError: The operator 'aten::linalg_cholesky_ex.L' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Thank you!

@Ismail-ai707
Copy link

The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device.
The temporary fix is not working for me (set PYTORCH_ENABLE_MPS_FALLBACK=1).

@oscarklee
Copy link

Vote for:
NotImplementedError: The operator aten::isin.Tensor_Tensor_out is not currently implemented for the MPS device.

@anirudhlakhotia
Copy link

Please also add aten::linalg_householder_product to the list!
The temporary fix does not work too

@TTonnyy789
Copy link

Vote for:
NotImplementedError: The operator aten::scatter_reduce.two_out is not currently implemented for the MPS device.

@qqaatw
Copy link
Collaborator

qqaatw commented Jul 16, 2024

NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. Encountered this running Llama3_8b via HuggingFace Transformers.

Saw a lot of votes on this one. Please plan it into the new release! Thank you!

For all folks who requested this operator, the op has been implemented in the nightly version and will be included in PyTorch 2.4.

@petravrablecova
Copy link

NotImplementedError: The operator aten::nanmedian.dim_values is not currently implemented for the MPS device.

@Pandaklez
Copy link

Pandaklez commented Jul 16, 2024

Vote for torch.nn.CTCLoss

@RichieHakim
Copy link

NotImplementedError: The operator 'aten::grid_sampler_2d_backward' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

@zxtgool
Copy link

zxtgool commented Jul 17, 2024

error: The operator 'aten::upsample_bicubic2d.out' is not currently implemented for the MPS device

@kgordon96
Copy link

NotImplementedError: The operator 'aten::_standard_gamma' is not currently implemented for the MPS device

@rossja
Copy link

rossja commented Jul 17, 2024

Following the instruction to comment here (from the message: NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. ...)

@thipyss
Copy link

thipyss commented Jul 19, 2024

The operator 'aten::upsample_bicubic2d.out' is not currently implemented for the MPS device.

@pradeepsharma
Copy link

Please prioritize "isin.Tensor_Tensor_out"

NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

@vonlaughing
Copy link

Please prioritize aten::upsample_bicubic2d.out, the temporary fix doesn't work for me :(

@thibaudbrg
Copy link

thibaudbrg commented Jul 21, 2024

The operator aten::_fft_r2c is not currently implemented for the MPS device 🙏

@gblssroman
Copy link

+1 for isin.Tensor_Tensor_out

@robtaylor
Copy link

So, really stupid question.. why do these functions need to be reimplemented on each accelerator architecture? Why isn't there a code generator/compiler for this?

@TTonnyy789
Copy link

Please prioritize aten::_convert_indices_from_coo_to_csr.out 🙏

NotImplementedError: The operator aten::_convert_indices_from_coo_to_csr.out is not currently implemented for the MPS device.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A request for a proper, new feature. module: mps Related to Apple Metal Performance Shaders framework tracker A tracking issue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet