README

Unintended Bias in toxicity classification

Udacity MachineLearning Engineer Capstone Project

Project from the Kaggle competition: Jigsaw unintended bias in toxicity classification

Project Overview

Natural Language Processing is a complex field which is hypothesised to be part of AI-complete set of problems, implying that the difficulty of these computational problems is equivalent to that of solving the central artificial intelligence problem making computers as intelligent as people. With over 90% of data ever generated being produced in the last 2 years and with a great proportion being human generated unstructured text there is an ever increasing need to advance the field of Natural Language Processing.

Recent UK Government proposal to have measures to regulate social media companies over harmful content, including "substantial" fines and the ability to block services that do not stick to the rules is an example of the regulamentary need to better manage the content that is being generated by users.

Other initiatives like Riot Games' work aimed to predict and reform toxic player behaviour during games is another example of this effort to understand the content being generated by users and moderate toxic content.

However, as highlighted by the Kaggle competition Jigsaw unintended bias in toxicity classification, existing models suffer from unintended bias where models might predict high likelihood of toxicity for content containing certain words (e.g. "gay") even when those comments were not actually toxic (such as "I am a gay woman"), leaving machine only classification models still sub-standard.

Having tools that are able to flag up toxic content without suffering from unintended bias is of paramount importance to preserve Internet's fairness and freedom of speech.

Project Report

Download the Project-Report.pdf

Acquiring the data

Download the data from https://www.kaggle.com/c/12500/download-all, unzip and place it in /input folder.

Python package requirements

 torch
 keras
 sklearn
 numpy
 pandas
 nltk

Python entry file

 /notebooks/Main.py

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
images		images
kaggle_submitted_models		kaggle_submitted_models
notebooks		notebooks
settings		settings
.gitignore		.gitignore
Capstone-Project.ipynb		Capstone-Project.ipynb
Project-Report.pdf		Project-Report.pdf
Proposal.pdf		Proposal.pdf
README.md		README.md
citations.tplx		citations.tplx
nbextensions.tplx		nbextensions.tplx
ref.bib		ref.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Unintended Bias in toxicity classification

Udacity MachineLearning Engineer Capstone Project

Project Overview

Project Report

Acquiring the data

Python package requirements

Python entry file

About

Releases

Packages

Languages

gromag/MachineLearning-Engineer-Specialisation-Capstone-Project-Udacity

Folders and files

Latest commit

History

Repository files navigation

README

Unintended Bias in toxicity classification

Udacity MachineLearning Engineer Capstone Project

Project Overview

Project Report

Acquiring the data

Python package requirements

Python entry file

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages