Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : replace conv stage_0 and stage_1 with im2col and mul_mat #559

Closed
ggerganov opened this issue Oct 8, 2023 · 10 comments
Closed

ggml : replace conv stage_0 and stage_1 with im2col and mul_mat #559

ggerganov opened this issue Oct 8, 2023 · 10 comments
Labels
refactoring Refactoring

Comments

@ggerganov
Copy link
Owner

Currently, we implement ggml_conv_1d and ggml_conv_2d as a sequence of 2 internal ops: stage_0 and stage_1

For more context see the discussion: #483

We should instead introduce ggml_im2col and reuse the ggml_mul_mat implementation both on the CPU and the GPU.

@FSSRepo
Copy link
Collaborator

FSSRepo commented Oct 9, 2023

I will do it. @ggerganov

@ggerganov
Copy link
Owner Author

We should probably start with some unit tests of the current implementation so that we can easily prove the new version is correct

@FSSRepo
Copy link
Collaborator

FSSRepo commented Oct 9, 2023

We should probably start with some unit tests of the current implementation so that we can easily prove the new version is correct

That would be great to speed up my work. Right now, I'm making a lot of changes blindly without testing since I want to finish quickly.

@audiovention
Copy link

I see that this issue is almost closed with the proposed changes almost ready in a pull request.
I had a different idea on how to restructure the two stages and wanted to share.
IMHO the im2col step should write in the scratch memory. As I'm currently working on a project that involves quite a lot of conv1d steps, and having each one create an internal tensor for the im2col output completely blows up my memory usage, while there's no good reason for this memory to all exist at the same time.

@slaren
Copy link
Collaborator

slaren commented Oct 13, 2023

having each one create an internal tensor for the im2col output completely blows up my memory usage

Does this happen while using ggml-alloc?

@audiovention
Copy link

No ggml-alloc

@slaren
Copy link
Collaborator

slaren commented Oct 13, 2023

With ggml-alloc, the memory of the temporary tensors should be automatically reused. The memory usage should be very similar compared to using the work buffer. If it doesn't, that most likely would be a bug. You can find an example of how to use it in the gpt-2 example, or here without all the ggml-backend stuff.

@audiovention
Copy link

oh ok, still very new to ggml, figuring out things
does this apply to back-propagation as well?

@slaren
Copy link
Collaborator

slaren commented Oct 13, 2023

I think that using ggml-alloc for training is possible, but currently requires some hacks to avoid freeing/overwriting some tensors. You could look into the finetune example in llama.cpp to see how it handles this.

@ggerganov
Copy link
Owner Author

Completed via #564

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Refactoring
Projects
Status: Done
Development

No branches or pull requests

4 participants