Classifying and ranking text using NLTK and The Nameless Horror

This is a small demo showing basic NLTK functionality (tokenizing, classifying, frequency counting), using The Collected Works of H.P. Lovecraft as a corpus. The code ought to be fairly self-explanatory, however:

The script will write a file, results.pickle, to your current working directory upon its first run, because classification is quite slow. This allows you to tune the tag set to be used for frequency counting without having to wait for re-classification each time.
There's a Jupyter notebook for interactive exploration

Requirements

Requests
BeautifulSoup4
NLTK
Matplotlib >= 1.5.x

And for the Notebook:

Pandas
Jupyter

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.gitignore		.gitignore
README.md		README.md
counts.py		counts.py
fhtagn.png		fhtagn.png
lovecraft.ipynb		lovecraft.ipynb
lovecraft.txt		lovecraft.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classifying and ranking text using NLTK and The Nameless Horror

Requirements

License

About

Releases

Packages

Languages

urschrei/lovecraft

Folders and files

Latest commit

History

Repository files navigation

Classifying and ranking text using NLTK and The Nameless Horror

Requirements

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages