README

Of Record is a system for storing and analysing what our elected officials say, to help understand why they might say it

Requirements:

For basic scraping, you need to install Perl and Mojolicious.

Install Perl on:

Windows: https://learn.perl.org/installing/windows.html
Mac OSX: https://learn.perl.org/installing/osx.html

Install Mojolicious: https://mojolicious.org

Get the Hansard files

From Theyworkforyou Run the following command:

rsync -az --progress --exclude '.svn' --exclude 'tmp/' --relative data.theyworkforyou.com::parldata/scrapedxml/debates/debates* .

Running the website

To run the website you'll need to install Elasticsearch

Then run initialise-es-db.pl to set the mappings.

Then run people-parse.pl to parse the 'people.json' file from TheyWorkForYou

Then run twfy-parse.pl to parse the files under ./scrapedxml/debates

Then run people-add-twitter-profiles.pl to parse the 'twitter.xml' file from TheyWorkForYou

Then run people-add-wikipedia-profiles.pl to parse the 'wikipedia-commons.xml' file from TheyWorkForYou

More advanced stuff

Running Stanford's NLP library in Perl seems a bit of a pain.

Install Inline::Java from source, with the command:

perl Makefile.PL J2SDK=/System/Library/Frameworks/JavaVM.framework/Versions/Current

At least that path works on my Mac.

Then install Lingua::StanfordCoreNLP

Other interesting modules:

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
of_record		of_record
.gitignore		.gitignore
README.md		README.md
gather-tweets.pl		gather-tweets.pl
hansard-fetch.pl		hansard-fetch.pl
hansard-parse.pl		hansard-parse.pl
hansard-unzip.pl		hansard-unzip.pl
hansard_v8.xsd		hansard_v8.xsd
initialise-es-db.pl		initialise-es-db.pl
initialise-es-quick.pl		initialise-es-quick.pl
people-add-twitter-profile.pl		people-add-twitter-profile.pl
people-add-wikipedia-profile.pl		people-add-wikipedia-profile.pl
people-parse.pl		people-parse.pl
stanford-nlp-process.pl		stanford-nlp-process.pl
twfy-parse.pl		twfy-parse.pl
twitter.xml		twitter.xml
v1-hansard-parse.pl		v1-hansard-parse.pl
wikipedia-commons.xml		wikipedia-commons.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Get the Hansard files

Running the website

More advanced stuff

About

Releases

Packages

Languages

robhammond/ofrecord

Folders and files

Latest commit

History

Repository files navigation

README

Get the Hansard files

Running the website

More advanced stuff

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages