Of Record is a system for storing and analysing what our elected officials say, to help understand why they might say it
Requirements:
For basic scraping, you need to install Perl and Mojolicious.
Install Perl on:
- Windows: https://learn.perl.org/installing/windows.html
- Mac OSX: https://learn.perl.org/installing/osx.html
Install Mojolicious: https://mojolicious.org
From Theyworkforyou Run the following command:
rsync -az --progress --exclude '.svn' --exclude 'tmp/' --relative data.theyworkforyou.com::parldata/scrapedxml/debates/debates* .
To run the website you'll need to install Elasticsearch
Then run initialise-es-db.pl
to set the mappings.
Then run people-parse.pl
to parse the 'people.json' file from TheyWorkForYou
Then run twfy-parse.pl
to parse the files under ./scrapedxml/debates
Then run people-add-twitter-profiles.pl
to parse the 'twitter.xml' file from TheyWorkForYou
Then run people-add-wikipedia-profiles.pl
to parse the 'wikipedia-commons.xml' file from TheyWorkForYou
Running Stanford's NLP library in Perl seems a bit of a pain.
Install Inline::Java from source, with the command:
perl Makefile.PL J2SDK=/System/Library/Frameworks/JavaVM.framework/Versions/Current
At least that path works on my Mac.
Then install Lingua::StanfordCoreNLP
Other interesting modules: