Add MNIST example with CNN #485

rgerganov · 2023-08-27T17:50:10Z

Add one more implementation for MNIST which uses Conv2D layers, ref: https://keras.io/examples/vision/mnist_convnet/
It achieves ~99% accuracy on the MNIST test set and also performs better for user inputs.

rgerganov · 2023-08-27T17:53:41Z

Right now I am using raw files for the model. I need to come up with a Python script which saves them to GGUF. Hence this is a draft.

ggerganov · 2023-08-27T17:57:19Z

examples/mnist/main-cnn.cpp

+ // TODO: is there a better way to convert 1d bias to 4d bias?
+ for (int i = 0; i < ne; i++) {
+ for (int j = 0; j < ne0*ne1; j++) {
+ ggml_set_f32_1d(bias, i*ne0*ne1 + j, bias_f32[i]);
+ }
+ }


We probably need to add 4d overloads for ggml_set_... but for now this should be fine

ggerganov · 2023-08-27T18:01:20Z

Right now I am using raw files for the model. I need to come up with a Python script which saves them to GGUF. Hence this is a draft.

Adding GGUF conversion could be a useful example, but it's also OK to keep reading .raw files.

Add one more implementation for MNIST which uses Conv2D layers, ref: https://keras.io/examples/vision/mnist_convnet/. It achieves ~99% accuracy on the MNIST test set and also performs better for user inputs. This implementation expects a model in GGUF format. You can get one with the 'mnist-cnn.py' script. Example usage: $ ./mnist-cnn.py train mnist-cnn-model ... Keras model saved to 'mnist-cnn-model' $ ./mnist-cnn.py convert mnist-cnn-model ... Model converted and saved to 'mnist-cnn-model.gguf' $ ./mnist-cnn mnist-cnn-model.gguf models/mnist/t10k-images.idx3-ubyte

rgerganov · 2023-08-28T13:42:12Z

Two observations on this simple example:

the conv2d operator in GGML expects the same kernel shape as the Conv2D layer in Keras (e.g. (3, 3, 1, 32) for the first one), but its dimensions are reordered. So I am using np.moveaxis() in the python code before exporting the convolution kernels.
The python gguf module is saving the tensor shape in reverse order. Maybe this is useful for the big models like llama but here it is not needed. I am using the raw_shape parameter as a workaround here.

ggerganov · 2023-08-28T16:41:08Z

I tested this and everything works correctly. Here is the graph plot:

the conv2d operator in GGML expects the same kernel shape as the Conv2D layer in Keras (e.g. (3, 3, 1, 32) for the first one), but its dimensions are reordered. So I am using np.moveaxis() in the python code before exporting the convolution kernels.

Yes, it's always difficult to match the shapes during such conversions. I guess there are conventions in standard frameworks but ggml is not following them and this creates difficulties. Having such kind of examples is very useful because it is easy to debug and understand where is the mismatch

The python gguf module is saving the tensor shape in reverse order. Maybe this is useful for the big models like llama but here it is not needed. I am using the raw_shape parameter as a workaround here.

It's not related to LLMs. The convention for referencing the axes in the tensor is different between Python and ggml:

# example 4d tensor in Python
print(x.shape)
(32, 1, 3, 3)

print(x.shape[0], kernel1.shape[1], kernel1.shape[2], kernel1.shape[3])
32 1 3 3

// same tensor in ggml
printf("(%d, %d, %d, %d)\n"), x->ne[0], x->ne[1], x->ne[2], x->ne[3]);
(3, 3, 1, 32)

So gguf.py reverses the dims order to match the ggml order.

Shall we merge this or add a README.md first?

rgerganov · 2023-08-28T21:45:07Z

I will update the README and the web app in a follow-up PR

rgerganov marked this pull request as draft August 27, 2023 17:52

ggerganov approved these changes Aug 27, 2023

View reviewed changes

rgerganov force-pushed the add-mnist-cnn branch from 8254e72 to 9a287f7 Compare August 28, 2023 13:25

rgerganov marked this pull request as ready for review August 28, 2023 13:26

rgerganov merged commit d4d6f51 into ggerganov:master Aug 28, 2023
2 checks passed

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this pull request Dec 18, 2023

Fix crash for 65B model with pre-allocated memory (ggerganov#485)

6f1ee4b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MNIST example with CNN #485

Add MNIST example with CNN #485

rgerganov commented Aug 27, 2023

rgerganov commented Aug 27, 2023

ggerganov Aug 27, 2023

ggerganov commented Aug 27, 2023

rgerganov commented Aug 28, 2023

ggerganov commented Aug 28, 2023

rgerganov commented Aug 28, 2023

Add MNIST example with CNN #485

Add MNIST example with CNN #485

Conversation

rgerganov commented Aug 27, 2023

rgerganov commented Aug 27, 2023

ggerganov Aug 27, 2023

Choose a reason for hiding this comment

ggerganov commented Aug 27, 2023

rgerganov commented Aug 28, 2023

ggerganov commented Aug 28, 2023

rgerganov commented Aug 28, 2023