The report outlines an implementation of a Generative Adversarial Network that attempts to generate realistic-looking images of handwritten text from ASCII text input along with a text Recognizer model. We present architecture, training, data, and a discussion of our model's qualitative and quantitative results on different datasets. The IAM handwriting Database is processed for training and validation, while the model is tested on a set of randomly generated and handwritten Shakespeare text. We also prepare baseline models for the recognizer and generator to gauge our primary model's performance. We also utilize Fréchet Inception Distance and Character Error Rate metrics to evaluate different sub-networks within the GAN. The final result of the project is a Generator that does not perform as expected, and a successful Recognizer. Potential ethical considerations for this project are also addressed.
High-level overview of the ScribeSmith model |
The easiest way to set up this project locally is to use conda. Verify that conda installed properly and added to PATH
by checking conda --version
.
To set up the conda environment, run:
conda env create -f environment.yml
conda activate scribesmith
As stated above, our training data is from the IAM Handwriting Database, available here. To ensure proper data loading without error:
- Download and unzip
data/ascii.tgz
and move onlylines.txt
into thedata/
folder - Download and unzip
data/lines.tgz
intodata/lines/
The end result should look something like this:
./
├── data/
│ ├── chars/
│ │ ├── 02/
│ │ │ ⋮
│ │ └── 72/
+ │ ├── lines/
+ │ │ ├── a01/
+ │ │ │ ⋮
+ │ │ └── r06/
+ │ └── lines.txt
To ensure the proper functionality of the Jupyter Notebook files, various subdirectories under src/
must be created according to the file structure below (only directories shown):
src/
+ ├── main_model/
+ │ ├── model_snapshots/
+ │ ├── model_training_information/
+ │ └── model_training_information_rawlist/
+ ├── recognizers/
└── shakespeare_demo_handwritten/
The group agrees that there were only slight differences in individual contribution proportions.
- Aniruddh Aragola (Aniruddh00001)
- Nabeth Ghazi (nabethg)
- Ran (Andy) Gong (AG2048)
- Zerui (Kevin) Wang (Togohogo1)
This project was part of the APS360 course offered by the University of Toronto Faculty of Applied Science and Engineering.
The report Latex template has been derived from the assignment guideline.