Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MNIST example with CNN #485

Merged
merged 1 commit into from
Aug 28, 2023
Merged

Conversation

rgerganov
Copy link
Collaborator

Add one more implementation for MNIST which uses Conv2D layers, ref: https://keras.io/examples/vision/mnist_convnet/
It achieves ~99% accuracy on the MNIST test set and also performs better for user inputs.

@rgerganov rgerganov marked this pull request as draft August 27, 2023 17:52
@rgerganov
Copy link
Collaborator Author

Right now I am using raw files for the model. I need to come up with a Python script which saves them to GGUF. Hence this is a draft.

Comment on lines 57 to 62
// TODO: is there a better way to convert 1d bias to 4d bias?
for (int i = 0; i < ne; i++) {
for (int j = 0; j < ne0*ne1; j++) {
ggml_set_f32_1d(bias, i*ne0*ne1 + j, bias_f32[i]);
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to add 4d overloads for ggml_set_... but for now this should be fine

@ggerganov
Copy link
Owner

Right now I am using raw files for the model. I need to come up with a Python script which saves them to GGUF. Hence this is a draft.

Adding GGUF conversion could be a useful example, but it's also OK to keep reading .raw files.

Add one more implementation for MNIST which uses Conv2D layers, ref:
https://keras.io/examples/vision/mnist_convnet/. It achieves ~99%
accuracy on the MNIST test set and also performs better for user inputs.
This implementation expects a model in GGUF format. You can get one with
the 'mnist-cnn.py' script. Example usage:

$ ./mnist-cnn.py train mnist-cnn-model
...
Keras model saved to 'mnist-cnn-model'

$ ./mnist-cnn.py convert mnist-cnn-model
...
Model converted and saved to 'mnist-cnn-model.gguf'

$ ./mnist-cnn mnist-cnn-model.gguf models/mnist/t10k-images.idx3-ubyte
@rgerganov rgerganov marked this pull request as ready for review August 28, 2023 13:26
@rgerganov
Copy link
Collaborator Author

Two observations on this simple example:

  • the conv2d operator in GGML expects the same kernel shape as the Conv2D layer in Keras (e.g. (3, 3, 1, 32) for the first one), but its dimensions are reordered. So I am using np.moveaxis() in the python code before exporting the convolution kernels.
  • The python gguf module is saving the tensor shape in reverse order. Maybe this is useful for the big models like llama but here it is not needed. I am using the raw_shape parameter as a workaround here.

@ggerganov
Copy link
Owner

I tested this and everything works correctly. Here is the graph plot:

mnist-cnn dot

the conv2d operator in GGML expects the same kernel shape as the Conv2D layer in Keras (e.g. (3, 3, 1, 32) for the first one), but its dimensions are reordered. So I am using np.moveaxis() in the python code before exporting the convolution kernels.

Yes, it's always difficult to match the shapes during such conversions. I guess there are conventions in standard frameworks but ggml is not following them and this creates difficulties. Having such kind of examples is very useful because it is easy to debug and understand where is the mismatch

The python gguf module is saving the tensor shape in reverse order. Maybe this is useful for the big models like llama but here it is not needed. I am using the raw_shape parameter as a workaround here.

It's not related to LLMs. The convention for referencing the axes in the tensor is different between Python and ggml:

# example 4d tensor in Python
print(x.shape)
(32, 1, 3, 3)

print(x.shape[0], kernel1.shape[1], kernel1.shape[2], kernel1.shape[3])
32 1 3 3
// same tensor in ggml
printf("(%d, %d, %d, %d)\n"), x->ne[0], x->ne[1], x->ne[2], x->ne[3]);
(3, 3, 1, 32)

So gguf.py reverses the dims order to match the ggml order.

Shall we merge this or add a README.md first?

@rgerganov rgerganov merged commit d4d6f51 into ggerganov:master Aug 28, 2023
2 checks passed
@rgerganov
Copy link
Collaborator Author

I will update the README and the web app in a follow-up PR

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this pull request Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants