OCR2Text

A deep learning solution designed during the AI4Good hackathon, for the Meza-OCR-Challenge.

The model parses images of hand-written notes composed of digits and a constrained set of punctuations/symbols (comma, dot and minus sign) to text.

The model is basically a resnet stacked on top of a seq2seq model. The idea of using this neural architecture is simple:

First, use CNN to extract local features to identify singular digits.
Then, use seq2seq to combine those local features sequentially for the translation part. Before having known there is something called CTC loss, we had realized that a naive mapping of the feature maps to a translation may not align well. Therefore, we kindof relied on the attention mechanism in seq2seq model to compensate for the order information.

Details

The resnet does 3 times down-sampling, and the final features map is flattened into a 1-dimensional representation of the images.

The seq2seq model uses attention on the flattened image feature maps. Beam search can be applied optionally at the inference stage.

Results

The general pipeline only has limited efficacy, perhaps due to the limited amount of data, or the limitation imposed by the selection of neural architectures or hyperparams.

Updates: We can achieve 81% as of this point...

Data preprocessing and augmentation

We investigated a little bit of image data proprocessing, i.e. segmentation and augmentation.

Image augmentation, which is simply converting images to grey-scale, has added a lot of benefits. Segmentation on the other hand didn't go very well...

Things I have learnt

jupyter notebook is a great tool for visualization and demonstration.
It's important to do sanity checks, especially to check the data preprocessing pipeline.

Collaborators

Thanks a lot! Shiquan and Helen.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Data		Data
Model		Model
Script		Script
lib		lib
output/20190615-224347-3-channels-corrected-nolength-86%		output/20190615-224347-3-channels-corrected-nolength-86%
README.md		README.md
att-pre.pdf		att-pre.pdf
att-vis-2.ipynb		att-vis-2.ipynb
att-vis.ipynb		att-vis.ipynb
kmean.py		kmean.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR2Text

Details

Results

Data preprocessing and augmentation

Things I have learnt

Collaborators

About

Releases

Packages

Contributors 3

Languages

HarveyYan/OCR2Text

Folders and files

Latest commit

History

Repository files navigation

OCR2Text

Details

Results

Data preprocessing and augmentation

Things I have learnt

Collaborators

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages