Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GGML_OP_VIEW backpropagation performance #633

Open
audiovention opened this issue Dec 5, 2023 · 0 comments
Open

GGML_OP_VIEW backpropagation performance #633

audiovention opened this issue Dec 5, 2023 · 0 comments

Comments

@audiovention
Copy link

I'm using ggml to train a small-ish network. I have a bunch of simplistic view operations (just reducing the leading dimension of a tensor), and I noticed the view operation introduces huge overhead in the backward pass. I'm now reworking my code to introduce new operators which perform this reshaping internally so I don't have to use the view op alltogether, however I think view is a very basic and very important op and should be optimized as much as possible.

I don't have many suggestions on how to do that, I'm thinking maybe a view op can store a view of the original gradient instead of creating a new tensor, and then accumulate into that view. I'm not yet super certain if that work with the ggml internals.

For starters I've extended slightly the accumulate op, so that the view backprop can at least avoid a completely useless multiply by zero (used just to get the shape right). In my specific case this boosted my training speed by 10% and memory footprint by 5% (I don't use the allocator though).

You can see my current workaround here, it's a very simple change
audiovention@513ffa8
I'm not making a pull request as I think this should be fixed/optimized in a different way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant