Credit Risk Analysis

Overview of the Analysis

The purpose of this analysis is to apply machine learning to solve the challenge of credit risk. Credit risk is an inherently unbalanced classification problem, because good loans easily outnumber risky loans. Therefore it is necessary to employ different techniques to train and evaluate models with unbalanced classes. In this challenge, I have used the imbalanced-learn and scikit-learn libraries to build and evaluate models using resampling. Using the credit card dataset from LendingClub, I have oversampled the data using the RandomOverSampler and SMOTE algorithms, and undersampled the data using the ClusterCentroids algorithm. Then I used a combination approach of over- and undersampling using the SMOTEEN algorithm. After that, I compared two new machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk.

Results

Naive Random Oversampling

- The balanced accuracy score is 66.3%. The precision of high-risk loans is 1%, and the recall is 64%.

SMOTE Oversampling

- The balanced accuracy score is 64.6%. The precision of high-risk loans is again 1%, while the recall this time is 63%.

ClusterCentroids

- Using the ClusterCentroids model of Undersampling, the balanced accuracy score is 51%. The precision is of high-risk loans still 1% and the recall is 59%.

SMOTEENN

- For this method of combination over and undersampling, the balanced accuracy score is 62.4%. The precision of high-risk loans is 1%, while the recall is 70%.

BalancedRandomForestClassifier

- Using the BalancedRandomForestClassifier ensemble method of resampling, the balanced accuracy score is 78.7%. The precision of high-risk loans is a slight improvement at 4%, and the recall of high-risk loans is 67%.

EasyEnsemble AdaBoost Classifier

- Using the EasyEnsemble AdaBoost Classifier model, the balanced accuracy score is 92.5%. The precision of high-risk loans is 7%, and the recall of high-risk loans is 91%.

Summary

None of these results were very accurate in predicting high-risk loans. The EasyEnsemble AdaBoost Classifier model was slightly better at detecting high-risk credit, but it also falsely predicted 979 high-risk loans. Therefore, I do not recommend any of these models for predicting credit risk.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Resources		Resources
.DS_Store		.DS_Store
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Risk Analysis

Overview of the Analysis

Results

Naive Random Oversampling

SMOTE Oversampling

ClusterCentroids

SMOTEENN

BalancedRandomForestClassifier

EasyEnsemble AdaBoost Classifier

Summary

About

Releases

Packages

Languages

sjwedlund/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Analysis

Overview of the Analysis

Results

Naive Random Oversampling

SMOTE Oversampling

ClusterCentroids

SMOTEENN

BalancedRandomForestClassifier

EasyEnsemble AdaBoost Classifier

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages