Trains a multi-layer perceptron (MLP) neural network to perform optical character recognition (OCR).
The training set is automatically generated using a heavily modified version of the captcha-generator node-captcha. Support for the MNIST handwritten digit database has been added recently (see performance section).
The network takes a one-dimensional binary array (default 20 * 20 = 400
-bit) as input and outputs an 10-bit array of probabilities, which can be converted into a character code. Initial performance measurements show promising success rates.
After training, the network is saved as a standalone module to ./ocr.js
, which can then be used in your project like this (from test.js
):
var predict = require('./ocr.js');
// a binary array that we want to predict
var one = [
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
];
// the prediction is an array of probabilities
var prediction = predict(one);
// the index with the maximum probability is the best guess
console.log('prediction:', prediction.indexOf(Math.max.apply(null, prediction)));
// will hopefully output 1 if trained with 0-9 :)
All runs below were performed with a MacBook Pro Retina 13" Early 2015 with 8GB RAM.
To test with the MNIST dataset: click on the title above, download the 4 data files and put them in a folder called mnist
in the root directory of this repository.
// config.json
{
"mnist": true,
"network": {
"hidden": 160,
"learning_rate": 0.03
}
}
Then run
$ node mnist.js
- Neurons
400
input160
hidden10
output
- Learning rate:
0.03
- Training set:
60000
digits - Testing set:
10000
digits - Training time:
21 min 53 s 753 ms
- Success rate:
95.16%
// config.json
{
"mnist": false,
"text": "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ012356789",
"fonts": [
"sans-serif",
"serif"
],
"training_set": 2000,
"testing_set": 1000,
"image_size": 16,
"threshold": 400,
"network": {
"hidden": 60,
"learning_rate": 0.1,
"output": 62
}
}
- Neurons
256
input60
hidden62
output
- Learning rate:
0.03
- Training set
- Testing set:
62000
characters - Training time:
8 min 18 s 560 ms
- Success rate:
93.58225806451614%
// config.json
{
"mnist": false,
"text": "abcdefghijklmnopqrstuvwxyz",
"fonts": [
"sans-serif",
"serif"
],
"training_set": 2000,
"testing_set": 1000,
"image_size": 16,
"threshold": 400,
"network": {
"hidden": 40,
"learning_rate": 0.1,
"output": 26
}
}
- Neurons
256
input40
hidden26
output
- Learning rate:
0.1
- Training set
- Testing set:
26000
characters - Training time:
1 min 55 s 414 ms
- Success rate:
93.83846153846153%
// config.json
{
"mnist": false,
"text": "0123456789",
"fonts": [
"sans-serif",
"serif"
],
"training_set": 2000,
"testing_set": 1000,
"image_size": 16,
"threshold": 400,
"network": {
"hidden": 40,
"learning_rate": 0.1
}
}
- Neurons
256
input40
hidden10
output
- Learning rate:
0.1
- Training set
- Testing set:
10000
digits - Training time:
0 min 44 s 363 ms
- Success rate:
99.59%
Tweak the network for your needs by editing the config.json
file located in the main folder. Pasted below is the default config file.
// config.json
{
"mnist": false,
"text": "0123456789",
"fonts": [
"sans-serif",
"serif"
],
"training_set": 2000,
"testing_set": 1000,
"image_size": 16,
"threshold": 400,
"network": {
"hidden": 40,
"learning_rate": 0.1
}
}
mnist
- If set to true, the MNIST handwritten digit dataset will be used for training and testing the network. This setting will overwrite configured set sizes and will ignore the
image_size
,threshold
,fonts
andtext
settings.
- If set to true, the MNIST handwritten digit dataset will be used for training and testing the network. This setting will overwrite configured set sizes and will ignore the
text
- A string containing the glyphs with which to train/test the network.
fonts
- An array of fonts to be used when generating images.
training_set
- Number of images to be generated and used as the network training set.
testing_set
- Same as above, but these images are used for testing the network.
image_size
- The size of the square chunk (in pixels) containing a glyph. The resulting network input size is
image_size
^2.
- The size of the square chunk (in pixels) containing a glyph. The resulting network input size is
threshold
- When analyzing the pixels of a glyph, the algorithm reduces each pixel
(r, g, b)
to(r + g + b)
and everything belowthreshold
is marked as 1 in the resulting binary array used as network input.
- When analyzing the pixels of a glyph, the algorithm reduces each pixel
network
hidden
- The size (number of neurons) of the hidden layer of the network.
learning_rate
- The learning rate of the network.