Skip to content

gloryodeyemi/COMP_8730_Assignment1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spell Correction using Minimum Edit Distance (MED)

This experiment uses the MED (Lavenshtein distance) algorithm to find the correct spelling of misspelled words in the Birkbeck corpus from the WordNet dictionary. Where k={1, 5, 10}, the average success at k, is calculated.

Keywords: Spell correction, Lavenshtein distance, Corpus, Dictionary, Natural Language Processing.

The Data

Two files, SHEFFIELDDAT.643 and FAWTHROP1DAT.643, out of the Birkbeck spelling error corpus by Roger Mitton was used for this experiment. They contain 1,193 words misspelled words in total and the correct equivalent of these words.

The WordNet dictionary contains 147,306 words.

Requirements

You can find the modules and libraries used in this project in the requirement.txt file. You can also run the code below.

pip install -r requirements.txt

Structure

  • Data: contains the Birbeck corpus files used for this project.

  • images: contains the bar graph showing the average success at k.

  • utils: contains the essential functions for this project.

  • Assignment_#1.ipynb and Assignment_#1.py are python notebook and script that uses the functions in the utils folder to generate the results.

Contact

Glory Odeyemi is currently undergoing her Master's program in Computer Science, Artificial Intelligence specialization at the University of Windsor, Windsor, ON, Canada. You can connect with her on LinkedIn.

References

  1. WordNet
  2. Birkbeck spelling error corpus
  3. Parallelization
  4. PyTrec-Eval-Terrier