Text Recognition App

The following project is based on the optical character recognition system of Full Stack Deep Learning Bootcamp. The purpose of the course is to describe the infrastructure, tooling, deployment and project structure needed for a deep learning production system.

Description of the architecture

The prediction system is composed of an ensemble of neural networks: a Fully Convolutional Network detects lines of text in the submitted picture, and each of these crops of lines of text is then passed through a sliding window CNN + LSTM model which detects the characters on each line of image.

Project structure

The following diagram details the separation of concerns across the different modules.

  |
  +-- _data: Store raw and processed data used to train the models.
  +-- app
  |   +-- api: Deployable containerized web server that provides a REST API for inference.
  |   |   +-- __init__.py
  |   |   +-- app.py
  |   |   +-- Dockerfile
  |   |   +-- tests
  |   |
  |   +-- notebooks: IPython notebooks for dataset exploration.
  |   |
  |   +-- tasks: Convenience bash scripts for common tasks such as model trainig and testing, 
  |   |          container building and running, and running tests.
  |   |
  |   +-- text_recognizer
  |   |   +-- datasets: Provides abstractions to access datasets and their metadata.
  |   |   +-- models: Provides abstractions to train models and use them for inference
  |   |   +-- networks: Actual neural networks used by models. By separating the network architecture
  |   |   |             from other concerns such as output interpretation, data augmentation techniques,
  |   |   |             model evaluation logic, etc., we facilitate experimentation on different network
  |   |   |             architectures by simply replacing the network used by the model.
  |   |   +-- tests
  |   |   +-- weights: Store trained model weights.
  |   |   +-- __init__.py
  |   |   +-- character_predictor.py: Provides an API for recognizing a character on a given fixed-size image.
  |   |   +-- line_predictor.py: Provides an API for bounding lines of text on an image.
  |   |   +-- paragraph_predictor.py: Provides an API for recognizing text on a given image.
  |   |   +-- util.py
  |   |
  |   +-- training: Provides convenience scripts for running experiments.
  |   |
  |   +-- wandb: Stores experiments metadata
  |
  +-- Pipfile
  +-- Pipfile.lock

Deployment

To deploy the containerized API in Google Run:

If necessary, install Google Cloud SDK.
Build the docker image:
cd text-recognizer/app
docker build -t text_recognizer_api -f api/Dockerfile .
Push the image to Google Container Registry.
On Google Cloud Platform, go to Cloud Run and create a new service. Provide the url of the pushed image as the Container image URL.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.circleci		.circleci
data/raw		data/raw
img		img
lab8		lab8
.gitattributes		.gitattributes
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
environment.yml		environment.yml
readme.md		readme.md
requirements-dev.in		requirements-dev.in
requirements-dev.txt		requirements-dev.txt
requirements.in		requirements.in
requirements.txt		requirements.txt
setup.md		setup.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Recognition App

Description of the architecture

Project structure

Deployment

About

Releases

Packages

Languages

sebastian-sosa/text-recognizer

Folders and files

Latest commit

History

Repository files navigation

Text Recognition App

Description of the architecture

Project structure

Deployment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages