The Rich Get Richer:
Disparate Impact of Semi-Supervised Learning

Preprocess file of the dataset used in implicit sub-populations:
(Demographic groups: race and gender)

The following code will pre-process the jigsaw dataset and return train/test dataset files including demographic groups information.

Download the jigsaw dataset: identity_individual_annotations.csv from

python preprocecss_jiasaw_toxicity_gender_and_race_balanced.py

Please follow the official implementations of MixMatch, MixText, and UDA.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
preprocecss_jiasaw_toxicity_gender_and_race_balanced.py		preprocecss_jiasaw_toxicity_gender_and_race_balanced.py