Skip to content

A basic NLTK demo, using the collected works of H. P. Lovecraft as a corpus

Notifications You must be signed in to change notification settings

urschrei/lovecraft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Classifying and ranking text using NLTK and The Nameless Horror

This is a small demo showing basic NLTK functionality (tokenizing, classifying, frequency counting), using The Collected Works of H.P. Lovecraft as a corpus. The code ought to be fairly self-explanatory, however:

  • The script will write a file, results.pickle, to your current working directory upon its first run, because classification is quite slow. This allows you to tune the tag set to be used for frequency counting without having to wait for re-classification each time.
  • There's a Jupyter notebook for interactive exploration

Requirements

  • Requests
  • BeautifulSoup4
  • NLTK
  • Matplotlib >= 1.5.x

And for the Notebook:

  • Pandas
  • Jupyter

License

MIT, copyright Stephan Hügel 2013

Fhtagn!

About

A basic NLTK demo, using the collected works of H. P. Lovecraft as a corpus

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published