This project implements deep-cnn-detector (and recognizer) in natural scene. I used keras framework and opencv library to build the detector. This detector determine digit or not with CNN classifier for the region proposed by the MSER algorithm.
- python 2.7
- keras 1.2.2
- opencv 2.4.11
- tensorflow-gpu==1.0.1
- Etc.
A list of all the packages needed to run this project can be found in digit_detector.yml.
I recommend that you create and use an anaconda env that is independent of your project. You can create anaconda env for this project by following these simple steps.
- Create anaconda env with the following command line:
$ conda env create -f digit_detector.yml
- Activate the env
$ source activate digit_detector
- Run the project in this env
The procedure to build digit detector is as follows:
Download train.tar.gz in https://ufldl.stanford.edu/housenumbers/ and unzip the file.
Svhn provides cropped training samples in matlab format. However, it is not suitable for detecting bounding box because it introduces some distracting digits to the sides of the digit of interest. So I collected the training samples directly using full numbers images and its annotation file.
- Train samples : (457723, 32, 32, 3)
- Validation samples : (113430, 32, 32, 3)
I designed a Convolutional Neural Network architecture for detecting character. This network classify text and non-text.
The architecture is as follows:
- INPUT: [32x32x1]
- CONV3-32: [32x32x32]
- CONV3-32: [32x32x32]
- POOL2: [16x16x32]
- CONV3-64: [16x16x64]
- CONV3-64: [16x16x64]
- POOL2: [8x8x64]
- FC: [1x1x1024]
- I used drop out in this layer.
- FC: [1x1x2]
The accuracy of the classifier is as follows
- Training Accuracy : 97.91%
- Test Accuracy : 96.98%
This Convolutional Neural Network recognize numbers. The architecture is same except for the number of class.
The architecture is as follows:
- INPUT: [32x32x1]
- CONV3-32: [32x32x32]
- CONV3-32: [32x32x32]
- POOL2: [16x16x32]
- CONV3-64: [16x16x64]
- CONV3-64: [16x16x64]
- POOL2: [8x8x64]
- FC: [1x1x1024]
- I used drop out in this layer.
- FC: [1x1x10]
- number of class is 10.
The accuracy of the classifier is as follows
- Training Accuracy : 95.41%
- Test Accuracy : 94.52%
In the running time, the detector operates in the 2-steps.
- The detector finds candidate region proposed by the MSER algorithm.
- The classifier determines whether or not it is a number in the proposed region.
- recall value : 0.630
- precision value : 0.045
- f1_score : 0.084
- recall value : 0.513
- precision value : 0.714
- f1_score : 0.597