Skip to content

robhammond/ofrecord

Repository files navigation

README

Of Record is a system for storing and analysing what our elected officials say, to help understand why they might say it

Requirements:

For basic scraping, you need to install Perl and Mojolicious.

Install Perl on:

Install Mojolicious: https://mojolicious.org

Get the Hansard files

From Theyworkforyou Run the following command:

rsync -az --progress --exclude '.svn' --exclude 'tmp/' --relative data.theyworkforyou.com::parldata/scrapedxml/debates/debates* .

Running the website

To run the website you'll need to install Elasticsearch

Then run initialise-es-db.pl to set the mappings.

Then run people-parse.pl to parse the 'people.json' file from TheyWorkForYou

Then run twfy-parse.pl to parse the files under ./scrapedxml/debates

Then run people-add-twitter-profiles.pl to parse the 'twitter.xml' file from TheyWorkForYou

Then run people-add-wikipedia-profiles.pl to parse the 'wikipedia-commons.xml' file from TheyWorkForYou

More advanced stuff

Running Stanford's NLP library in Perl seems a bit of a pain.

Install Inline::Java from source, with the command:

perl Makefile.PL J2SDK=/System/Library/Frameworks/JavaVM.framework/Versions/Current

At least that path works on my Mac.

Then install Lingua::StanfordCoreNLP

Other interesting modules:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published