How to get top 2% position on Kaggle’s MNIST — Digit Recognizer
Applying CNN
This tutorial shows the use of a convolutional neural network model that was built and trained with Keras on top of Tensorflow.
This first part I’ll focus on Machine Learning model, parameters and results. The second part I’ll explain how to deploy this model as an API.
I trained the model on the MNIST dataset provided by Kaggle to produce good results in recognize handwritten digits.
“The MNIST database is a large database of handwritten digits.”
For those that are new to Tensorflow and Keras, I would recommend to start there trying some tutorials and play around with the code and then come back here.
For those that are entirely new to the subject of deep learning, I would recommend you to check out fast.ai, Cognitive Class AI, Siraj or Sentdex.
Before we start: Here are some hints on how I set up my workstation for this tutorial. Please make sure you have all of that running on your computer before you go on with this tutorial. The installation process might take some time so be sure you do not cross the plans of your significant other.
Prerequisites for this tutorial:
- Tensorflow (version 1.1.0) — I'm using Tensorflow GPU
- Python 3.6
- Keras 2 (version 2.15)
- Pandas (version 0.22)
- Numpy (version 1.12)
- Scikit-learn (version 0.19)
- Matplotlib (version 2.2.2)
Alright if you made it to this point, some stuff will be familiar to you which is really good!
The Convolutional Neural Network Model
A convolutional neural network can have tens or hundreds of layers that each learns to detect different features of an image. Filters are applied to each training image at different resolutions, and the output of each convolved image is used as the input to the next layer.
In our case, each image data point is a representation of 28 pixels by 28 pixels image, for a total of 784 pixels. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
After some tests and researches I've reached this model:
Let's code this model.
I prefer to keep the code organized, and to do that I put the model in a separated file and called later in the main program.
You can play with some parameters (filter size, kernel size, padding, optimizer, learning rate, dense size, and others). Let me know your results.
Okay, now the model is ready it is time to code the main part.
This is the program:
You might saw I've included graphs, elapsed time, and other visual results. I hope you have enjoyed this first part.
Results
Follow me on Medium or in my Github, I'll post the second part soon.
References:
https://www.mathworks.com/discovery/deep-learning.html
https://www.kaggle.com/c/digit-recognizer/
https://www.tensorflow.org/versions/r1.1/get_started/mnist/beginners
https://www.tensorflow.org/versions/r1.1/get_started/mnist/pros