Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confirmation about the order of tensor dimensions #500

Closed
mwmercury opened this issue Sep 1, 2023 · 5 comments
Closed

Confirmation about the order of tensor dimensions #500

mwmercury opened this issue Sep 1, 2023 · 5 comments

Comments

@mwmercury
Copy link

Hello! Thank you so much for developing and sharing this awesome library!
Can I ask a silly question?

I'm investigating the source code of gpt-neox and I see these lines from convert-h5-to-ggml.py file:

    for i in range(n_dims):
        fout.write(struct.pack("i", data.shape[n_dims - 1 - i]))
    fout.write(str)

Could you please tell me why do we need to store in the reverse order?
Thank you in advance!

@YavorGIvanov
Copy link
Collaborator

YavorGIvanov commented Sep 3, 2023

This is done, because the dimension order in GGML is the reverse of the dimension order used in PyTorch. In PyTorch the order is N x C x H x W. In GGML it is W x H x C x N.

N - the batch dimension, C - the channel dimension, H - number of rows, and W - number of columns.

In GGML W x H x C X N correspond to the ne[0] x ne[1] x ne[2] x ne[3] members of a tensor.

Say we have a 4 dimension tensor named "t" in both GGML and PyTorch.
Here is the correspondence:

GGML PyTorch
t->ne[0] t.shape[3]
t->ne[1] t.shape[2]
t->ne[2] t.shape[1]
t->ne[3] t.shape[0]

@mwmercury
Copy link
Author

@YavorGIvanov
Thank you very much for your kind response!
I have another question: Why was the decision made to reverse the order of dimensions compared to the one used in PyTorch? Is there a specific reason, such as improved memory management or performance, for this choice?

@YavorGIvanov
Copy link
Collaborator

YavorGIvanov commented Sep 3, 2023

Here ggerganov will be able to provide the best answer, but I don't think the library was intended to match any other deep learning library and the initial version was written relatively quickly.

As you design the data type representing a tensor, you may decide to limit the dimensions to a fixed upper number and then use a static C++ array comprised of integers in order to store the size of each dimensions + dimension count integer. This avoids dynamic allocation of the dimension array making it more memory/cache friendly. Also it has the pro of making all tensors have the same dimension array (ne[4]) + count. However, in order to make the dimension array easy to use you need to store the dimensions in sequantial order.

E.g. This makes it easy to compare the width dimension of 2D, 3D and 4D tensor as you know that all of their width dimensions size is at ne[0] instead of ne[1], ne[2] and ne[3]

@YavorGIvanov
Copy link
Collaborator

If you have any additional questions, you can reopen the issue or open a new one with label "question".

@mwmercury
Copy link
Author

@YavorGIvanov
I'm sorry for my late response.
Very detailed and helpful explanation. Thank you so much!

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this issue Dec 18, 2023
* Retire the ggml_mul_mat() for transposed src0

- It can always be made contiguous with ggml_cpy()
- The code is now simplified
- The results are deterministic in respect to num threads

* SIMD-ify dequantize_row_q4_0() for ARM_NEON (ggerganov#502)

* Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON

* Fix dequantization - forgot to interleave the quants
@jbochi jbochi mentioned this issue Jan 5, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants