Skip to content

Supervised Learning..Build/Evaluate Machine algorithms to predict credit risk

Notifications You must be signed in to change notification settings

minut9/Credit_Risk_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Credit_Risk_Analysis

Purpose

The purpose of this analysis is to use Machine Learning to determine risks of applicates from a data set from LendingClub. This project is classified as “Supervied Learning” because the data includes labeled outcomes. To complete the analysis, adjustments to balance the unbalanced classifications from the given data set were made for more accurate predictions for higher accuracy scores.

Tools/Machine Learning Algorithms

  • RandomOverSampler
  • SMOTE
  • ClusterCentroids
  • SMOTEENN
  • BalancedRandomForestClassifier
  • EasyEnsembleClassifier

Results

Originally, the dataset had over 100,000 loan applicants in Q1 2019. After using the loan status to determine if the application was considered "high" or "low" risk, the applicants who were classified as "current" or "loan status" were classified as "low risk", meaning the rest of the data was "high risk". By cleaning the data, this reduced the dataset to 68470 with nearly all applicants classified to "low risk"(99%).

init balance of target values

OverSampling

RandomeOverSampler Model

RandomeOverSampler Model found a balanced accuracy score of 64%

Naive Random Oversampling

Naive Risk

  • The high risk precision rate was 1% with a recall of 66%, which gave this result of an F1 score of 2%.
  • The low risk had a precision of 100% and the recall was at 62%.

Naive Oversampling report

SMOTE (Synthetic Minority Oversampling Technique)

The SMOTE algorithm had a balanced accuracy score of 65.8% which is somewhat better than the previous model.

Smote

  • The high risk precision, again, was only 1% but the recall dropped slightly to 62%
  • The low risk still had a precision of 100% but improved the recall score to 69%.

Smote risk

Smote report

UnderSampling

Cluster Centroids algorithm

This algorithms balanced score was lower than the oversamplings scores at 54.4%

CC Balanced

  • The high risk precision rate was at 1% and the recall at 69%. The F1 score was 1%.
  • The low risk model had a precision rate of 100% with a low recall rate of 40% compared to the oversampling models.

CC risk

CC report

Combination Sampling

SMOTEENN

(Synthetic Minority Oversampling Technique + Edited Nearest Neighbors) or SMOTEENN had a balanced accuracy score with was 64.8%

SMOTEENN balance

  • The high risk precision rate was 1% and the precision rate was a 72%, which brought the F1 score to 2%.
  • The low risk was still 100%, but with a recall at 57%.

SMOTEENN risk

SMOTEENN report

Ensemble Classifiers to Predict Credit Risk

BalancedRandomForestClassifier

This algorithm brought the balanced accuracy score to 78.8%.

Balanced Random Forrest Classifiier

  • The high risk precision rate increased to 3% with the recall at 70% to give the F1 score of 6%.
  • The low risk still had a precision score of 100% but a high recall of 87%.

Forrest Classifier risk

Forrest classifier report

EasyEnsembleClassifier Model

This algorithm had the best algorithm score of 93.1%.

Easy Ensemble AdaBoost Classifier

  • The high risk precision rate increased to 9% and the recall increased to 92% wiht the highest F1 score of 16%.
  • The low risk precision rate was still 100% but the recall jumped to 94%.

Easy Esemble risk

Easy Ensemble report

Summary

After reviewing the results, it was clear that the EasyEnsembleClassifier Model had the best results with an accuracy score of 93.1 and a precision rate of 9% when predicitng high risk applicants. The recall rate was also the highest at 92% for high risk applicants as well as low risk applicants, 94%. This model is clearly the best model to choose because it has the best algorithm to assess credit risks when lending to applicants.