All data are provided by University of Michigan.
-
Data Extraction using Regex extracts dates from messy medical records, stored in 'dates.txt'.
-
Spelling Recommender
- First examines the linguistic characteristics of a novel, Moby Dick ('moby.txt') with NLTK
- Then develops spelling recommenders based on 3 different similarity measures (Jaccard Distance on Trigram, Jaccard Distance on 4-gram and Edit Distance).