This is an example project for the Good Research Code Handbook. It's a reinterpretation of the Zipf's law project from Research Software Engineering in Python. It reuses and modifies some of the code from the original project, which was licensed under a CC-BY license. For this reason, this repo is under a CC-BY 4.0 license.
Make a copy of this repo (e.g. with git clone), cd into the root folder of the repo, and run:
pip install -e .
The project is organized into folders:
zipf
contains the main module code that runs the analysisscripts
contains scripts to glue the module codetests
contains tsts of the module codedata
contains the data for the analysisresults
will contain the output of the analysis
cd
into the scripts folder and run run_analysis.py
via:
python run_analysis.py --in_folder ../data --out_folder ../results
You can then load up visualize_results.ipynb
in jupyter to visualize the results.
cd
into the tests folder and run pytest
.
I've pre-populated the data folder with these books from Project Gutenberg:
- Dracula →
data/dracula.txt
- Frankenstein →
data/frankenstein.txt
- Jane Eyre →
data/jane_eyre.txt
You can add more documents to the folder as you wish.