Skip to content

Textual data (and scrapers) produced by the U.S.A. presidency

Notifications You must be signed in to change notification settings

carriechen/presidents

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Presidential data & analysis

Installation

These instructions assume you already have Python 3 installed.

The scrapers rely on the requests, BeautifulSoup4, and python-dateutil libraries (among others), with the Jupyter notebooks requiring many more.

You'll need to install a few non-Python dependencies on your system:

  • On macOS with Homebrew:

    brew install cairo py3cairo jpeg jpeg-turbo node
    
  • On CentOS 7:

    yum -y groupinstall development
    yum -y install cairo-devel cairo-tools
    yum -y install libjpeg-turbo libjpeg-turbo-devel
    yum -y install nodejs nodejs-devel
    yum -y install https://downloads.sourceforge.net/project/mscorefonts2/rpms/msttcore-fonts-installer-2.6-1.noarch.rpm
    fc-cache /usr/share/fonts/msttcore
    

Once you have Node.js & npm installed:

npm install -g vega vega-lite

Python dependencies and environment

It's recommended to sandbox everything into a virtualenv, which makes installation more predictable / reliable:

sudo pip install -U virtualenv
virtualenv ~/presidents-venv
source ~/presidents-venv/bin/activate

Now that you've activated the virtualenv, get the code:

git clone https://github.com/chbrown/presidents ~/presidents
cd ~/presidents

To install the required Python libraries:

pip install -r requirements.txt

To install the language model for spaCy:

python -m spacy download en_core_web_md

To prepare the data for reading from notebooks:

make data/tapp/all.local-cache.json

Get the LIWC 2007 dictionary (see liwc-python for details) and move it to:

/usr/local/data/liwc_2007.dic

To start a Jupyter notebook server:

export PYTHONPATH=~/presidents
jupyter notebook notebooks/

License

Copyright © 2017–2018 Christopher Brown. MIT Licensed.

About

Textual data (and scrapers) produced by the U.S.A. presidency

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.5%
  • Python 1.4%
  • Makefile 0.1%