Skip to content

AnonymizedGithub/Disparate-SSL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

The Rich Get Richer:
Disparate Impact of Semi-Supervised Learning

Preprocess file of the dataset used in implicit sub-populations:
(Demographic groups: race and gender)

The following code will pre-process the jigsaw dataset and return train/test dataset files including demographic groups information.

Step-1:

Download the jigsaw dataset: identity_individual_annotations.csv from

https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data.

Step-2:

python preprocecss_jiasaw_toxicity_gender_and_race_balanced.py

Implementation of SSL methods

Please follow the official implementations of MixMatch, MixText, and UDA.

[1] https://github.com/google-research/mixmatch

[2] https://github.com/GT-SALT/MixText

[3] https://github.com/google-research/uda

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 100.0%