Classification of Handwritten digits (MNIST) using one hidden layer Neural Network in Numpy (Python 3)
©EsterHlav, 27 July 2017
The goal of the project is to build, train and test a simple neural network using only Numpy as a computational library. The difficulty mainly comes from the calculation of gradient (of objective function regarding the parameters) and finding the appropriate backpropagation equations.
We use an architecture with only one hidden layer of 500 units. We use as activation function tanh and as output softmax. We also use the cross-entropy loss since this is a multi-classification problem.
MNIST (official page) Set of 28 x 28 pixels images: training data of 60,000 images and test data of 10,000 images. The labels are in the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, making the prediction a 10-class classification problem.
Note: Before being fed to the Neural Network, each image is normalized, e.g. the image is substracted from its mean and divised by its standard deviation.
Run the following command in unix terminal to install the packages needed to run along Python 3:
$ pip3 install numpy matplotlib Pillow
Run the following command in unix terminal:
$ python3 train.py
Modifications of parameters available in train.py. By default the Neural Network is trained with the following parameters:
- Number of hidden units: 500
- Learning rate: 0.005
- Decay: 0.0002
- Regularization: 0.015
- Number of epochs: 150
Also, the input layer has size 28 x 28 = 784 and output layer is of size 10.
The equations of the Neural Network are:
From this, using the Chain's rule we can find the gradient equations for all parameters (used for Gradient Descent):
Run the following command in unix terminal:
$ python3 main.py
The test can be done with the 3 different trained models: test50epch, test100epch and test150epch. Their error rate on the test set are respectively: 50.98%, 14.57% and 13.61%
Run the following command in unix terminal:
$ python3 guiNN.py
As the Neural Network was trained on the MNIST dataset, we need to preprocess the drawn image as similar as possible to MNIST. Therefore we use a binary threshold, a inner-limit-box of 20 x 20 pixels and a linear wrapping for centering the digits.
Define the class NeuralNetwork. All the computations are made in Numpy arrays and matrices. Backpropagation equations are hard-coded for tanh and softmax using cross-entropy loss. Use support•py for most of annex functions, use also Numpy.
Build a Neural Network and train it according to parameters provided, and then save it in pickle format to reuse later. Also, option to graph the evolution of loss, training error and validation error. Depends on NN•py and support•py.
Test the Neural Network provided on 20 random images from the test set. Depends on NN•py and support•py.
Open GUI using tkinter (for GUI) to draw a digit on a squared canvas. Image is then processed and predicted using PIL (image wrapper), Scipy (image transformation) and Numpy (matrix wrapper). Depends on MouseDrawing•py, PIL, Scipy and Numpy.
Define canvas and function to draw with tkinter (for GUI). Depends on tkinter.
I discovered how much a simple one layer Neural Network tends to overfit as much as possible on MNIST. Indeed, after having trained the network, I realized that the good results on the test sets were due to the high degree of similarity in all the digits of MNIST. However, when drawing digits on a canvas, the classification was not as accurate. Only the help of a similar preprocess of the drawing close to the one on MNIST helped improving the predictions. A Convolutional Neural Network would have certainly be less sensitive to different images as it captures visual patterns, explaining clearly how much more accurate is a CNN.
1. Good explanation of Backpropagation equations and their derivation
2. Alternative explanation of Backpropagation equations
3. Source of inspiration for Numpy implementation of Neural Nerwork