derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented #429

Darius888 · 2024-06-04T12:09:06Z

Hello,

When trying to apply the Sine Wave example approach to a transformer based model I get the following output:

File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 767, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented

Regression task setup. Multiple sequences.

Is it possible to somehow work around this ?

Thank you,

JingminSun · 2024-06-04T16:25:49Z

I think this happens when you set first_order = False, so the simplest way is to set first_order = True

If you really want to do second order, check this pytorch/pytorch#117974

Darius888 · 2024-06-04T18:27:07Z

This was exactly it, thank you so much! @JingminSun

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented #429

derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented #429

Darius888 commented Jun 4, 2024 •

edited

Loading

JingminSun commented Jun 4, 2024

Darius888 commented Jun 4, 2024

derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented #429

derivative for aten::_scaled_dot_product_efficient_attention_backward is not implemented #429

Comments

Darius888 commented Jun 4, 2024 • edited Loading

JingminSun commented Jun 4, 2024

Darius888 commented Jun 4, 2024

Darius888 commented Jun 4, 2024 •

edited

Loading