You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using ggml to train a small-ish network. I have a bunch of simplistic view operations (just reducing the leading dimension of a tensor), and I noticed the view operation introduces huge overhead in the backward pass. I'm now reworking my code to introduce new operators which perform this reshaping internally so I don't have to use the view op alltogether, however I think view is a very basic and very important op and should be optimized as much as possible.
I don't have many suggestions on how to do that, I'm thinking maybe a view op can store a view of the original gradient instead of creating a new tensor, and then accumulate into that view. I'm not yet super certain if that work with the ggml internals.
For starters I've extended slightly the accumulate op, so that the view backprop can at least avoid a completely useless multiply by zero (used just to get the shape right). In my specific case this boosted my training speed by 10% and memory footprint by 5% (I don't use the allocator though).
You can see my current workaround here, it's a very simple change audiovention@513ffa8
I'm not making a pull request as I think this should be fixed/optimized in a different way.
The text was updated successfully, but these errors were encountered:
I'm using ggml to train a small-ish network. I have a bunch of simplistic view operations (just reducing the leading dimension of a tensor), and I noticed the view operation introduces huge overhead in the backward pass. I'm now reworking my code to introduce new operators which perform this reshaping internally so I don't have to use the view op alltogether, however I think view is a very basic and very important op and should be optimized as much as possible.
I don't have many suggestions on how to do that, I'm thinking maybe a view op can store a view of the original gradient instead of creating a new tensor, and then accumulate into that view. I'm not yet super certain if that work with the ggml internals.
For starters I've extended slightly the accumulate op, so that the view backprop can at least avoid a completely useless multiply by zero (used just to get the shape right). In my specific case this boosted my training speed by 10% and memory footprint by 5% (I don't use the allocator though).
You can see my current workaround here, it's a very simple change
audiovention@513ffa8
I'm not making a pull request as I think this should be fixed/optimized in a different way.
The text was updated successfully, but these errors were encountered: