If you are looking for examples of how ML can fail despite all its incredible potential, you have come to the right place. Beyond the wonderful success stories of applied ML, here is a list of failed projects which we can learn a lot from.
- Classical Machine Learning
- Computer Vision
- Forecasting
- Natural Language Processing
- Recommendation Systems
Title | Description |
---|---|
Amazon AI Recruitment System | AI-powered automated recruitment system cancelled after evidence of discrimination against female candidates |
Genderify - Gender identification tool | AI-powered tool designed to identify gender based on fields like name and email address was shut down due to built-in biases and inaccuracies |
Leakage and the Reproducibility Crisis in ML-based Science | A team at Princeton University found 20 reviews across 17 scientific fields that discovered significant errors (e.g., data leakage, no train-test split) in 329 papers that use ML-based science |
COVID-19 Diagnosis and Triage Models | Hundreds of predictive models were developed to diagnose or triage COVID-19 patients faster, but ultimately none of them were fit for clinical use and some were potentially harmful |
COMPAS Recidivism Algorithm | Florida’s recidivism risk system found evidence of racial bias |
Pennsylvania Child Welfare Screening Tool | The predictive algorithm (which helps identify which families are to be investigated by social workers for child abuse and neglect) flagged a disproportionate number of Black children for 'mandatory' neglect investigations. |
Oregon Child Welfare Screening Tool | A similar predictive tool to the one in Pennsylvania, the AI algorithm for child welfare in Oregon was also stopped a month after the Pennsylvania report |
U.S. Healthcare System Health Risk Prediction | A widely used algorithm to predict healthcare needs exhibited racial bias where for a given risk score, black patients are considerably sicker than white patients |
Title | Description |
---|---|
Inverness Automated Football Camera System | AI camera football-tracking technology for live streaming repeatedly confused a linesman’s bald head for the ball itself |
Amazon Rekognition for US Congressmen | Amazon's facial recognition technology (Rekognition) falsely matched 28 congresspeople with mugshots of criminals, while also revealing racial bias in the algorithm |
Amazon Rekognition for law enforcement | Amazon's facial recognition technology (Rekognition) misidentified women as men, particularly those with darker skin |
Zhejiang traffice facial recognition system | Traffic camera system (designed to capture traffic offenses) mistook a face on the side of a bus as someone who jaywalked |
Kneron tricking facial recognition terminals | The team at Kneron used high-quality 3-D masks to deceive AliPay and WeChat payment systems to make purchases |
Twitter smart cropping tool | Twitter's auto-crop tool for photo review displayed evident signs of racial bias |
Depixelator tool | Algorithm (based on StyleGAN) designed to generate depixelated faces showed signs of racial bias, with image output skewed towards the white demographic |
Google Photos tagging | The automatic photo tagging capability in Google Photos mistakenly labelled black people as gorillas |
GenderShades evaluation of gender classification products | GenderShades' research revealed that Microsoft and IBM’s face-analysis services for identifying gender of people in photos frequently erred when analyzing images of women with dark skin |
New Jersey Police Facial Recognition | A false facial recognition match by New Jersey police landed an innocent black man (Nijeer Parks) in jail even though he was 30 miles away from the crime |
Tesla's dilemma between a horse cart and a truck | Tesla's visualization system got confused by mistaking a horse carriage as a truck with a man walking behind it |
Google's AI for Diabetic Retinopathy Detection | The retina scanning tool fared much worse in real-life settings than in controlled experiments, with issues such as rejected scans (from poor scan image quality) and delays from intermittent internet connectivity when uploading images to the cloud for processing |
Title | Description |
---|---|
Google Flu Trends | Flu prevalence prediction model based on Google searches produced inaccurate over-estimates |
Zillow iBuying algorithms | Significant losses in Zillow's home-flipping business due to inaccurate (overestimated) prices from property valuation models |
Tyndaris Robot Hedge Fund | AI-powered automated trading system controlled by a supercomputer named K1 resulted in big investment losses, culminating in a lawsuit |
Sentient Investment AI Hedge Fund | The once high flying AI-powered fund at Sentient Investment Management failed to make money and was promptly liquidated in less than 2 years |
Title | Description |
---|---|
Microsoft Tay Chatbot | Chatbot that posted inflammatory and offensive tweets through its Twitter account |
Nabla Chatbot | Experimental chatbot (for medical advice) using a cloud-hosted instance of GPT-3 advised a mock patient to commit suicide |
Facebook Negotiation Chatbots | The AI system was shut down after the chatbots stopped using English in their negotiations and started using a language that they created by themselves |
OpenAI GPT-3 Chatbot Samantha | A GPT-3 chatbot fine-tuned by indie game developer Jason Rohrer to emulate his dead fiancée was shut down by OpenAI after Jason refused their request to insert an automated monitoring tool amidst concerns of the chatbot being racist or overtly sexual |
Amazon Alexa plays porn instead of song | Amazon's voice-activated digital assistant unleashed a torrent of raunchy language after a toddler asked it to play a children’s song. |
Title | Description |
---|---|
IBM's Watson Health | IBM’s Watson allegedly provided numerous unsafe and incorrect recommendations for treating cancer patients |