In this assignment, the author explored the potential 'bias' that exist in English politicians articles. Specifically, the author analyzed the differences on the coverage of politicians on Wikipedia and the article quality about politicians between various countries. The article qualities are collected by using a machine learning API called ORES provided by Wikipedia.
You will need Python 3.X and Jupyter notebook installed to reproduce this project by running hcds-a2-biasn.ipynb. To install Python 3, see download and beginner's guide To install Jupyter Notebook, follow installation
Addtionally, you will need following packages:
The dataset can be found from here
Note: Please read through the documentation for this repository, then download and unzip it. You will need page_data.csv in the 'data' directory. Otherwise, there is a copy of this dataset in this github repo which can be found from here
page_data.csv contains following columns:
- 'country', country that relates to the article
- 'page', title of the article
- 'rev_id', Revision Id, the id to identify the article
The dataset can be found from Population Research Bureau website
Note: Please look for the 'Microsoft Excel' incon in the upper right and download this data as a CSV file.
Population Mid-2015.csv contains following columns:
- 'Location', name of country
- 'Location Type', type of location, which are all listed as country
- 'TimeFrame', the time when the data were collected
- 'Data Type', data type of population, which are all listed as number
- 'Data', population
- 'Footnotes', applicable footnotes, all blank in this dataset
In this project, we will use a WikiMedia API called ORES ("Objective Revision Evaluation Service") (See dcoumentation). ORES estimates the quality of an article (at a particular point in time), and assigns a series of probabilities that the article is in one of 6 quality categories. The options are, from best to worst:
- FA - Featured article
- GA - Good article
- B - B-class article
- C - C-class article
- Start - Start-class article
- Stub - Stub-class article
-
The Wikipedia article data, along with the code used to generate that data, are released under the CC-BY-SA 4.0 license.
-
See About PRB for the license and copyright information for the population data from Population Reference Bureau
-
By using Wikimedia ORES API, you agree to Wikimedia's Terms of Use and Privacy Policy.
This project is licensed under the MIT License - see the LICENSE.md file for details