Skip to content

Build and evaluate several machine learning algorithms to predict credit risk.

Notifications You must be signed in to change notification settings

DSupps/Credit_Risk_Analysis

Repository files navigation

Credit_Risk_Analysis

Credit Risk Analysis Challenge Overview:

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans.

Using a credit card credit dataset and Python, several machine learning modules will be used to evaluate and predict credit risk.

Techniques used to predit credit risk:

  • Oversample the data using the RandomOverSampler and SMOTE algorithms.
  • Undersample the data using the ClusterCentroids algorithm.
  • A combinatorial approach of over- and undersampling using the SMOTEENN algorithm.
  • Compare two new machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk.

Once these models have been completed, their performance will be evaluated and a written recommendation will be made on whether they should be used to predict credit risk

Deliverables:

  • Deliverable 1: Use Resampling Models to Predict Credit Risk
  • Deliverable 2: Use the SMOTEENN Algorithm to Predict Credit Risk
  • Deliverable 3: Use Ensemble Classifiers to Predict Credit Risk
  • Deliverable 4: A Written Report on the Credit Risk Analysis

Resources:

Data Sources: - LoanStats_2019Q1.csv

Software: - Jupyter Notebook 6.1.4 - Python 3.8.5

Credit Risk Analysis Challenge Results:

Deliverable 1 Results: Use Resampling Models to Predict Credit Risk

Oversampling RandomOverSampler Model:

randomoversampler_balanced_accuracy

randomoversampler_confusion_matrix

randomoversampler_classification_report

  • Accuracy Score for the RandomOverSampler model is 63%
  • The precision for the high-risk is 1% and F1 score is 2%, which are not good enough to state that the model will be good at classifying.

SMOTE Oversampling Model:

smote_balanced_accuracy

randomoversampler_confusion_matrix

randomoversampler_classification_report

  • The accuracy score of the SMOTE model is a little bit better than the RandomOverSampler.
  • The precision for the high-risk is very low at 1%, indicating a large number of false positives, which indicates an unreliable classification.
  • The F1 score is 2% which also very low.

Undersampling ClusterCentroids Model:

undersampling_balanced_accuracy

undersampling_confusion_matrix

undersampling_classification_report

Deliverable 2 Results: Use the SMOTEENN Algorithm to Predict Credit Risk

Combination Sampling SMOTEENN Model:

combo_balanced_accuracy

combo_confusion_matrix

combo_classification_report

Deliverable 3 Results: Use Ensemble Classifiers to Predict Credit Risk

Balanced Random Forest Classifier Model:

balanced_random_forest_classification

Easy Ensemble AdaBoost Classifier Model:

Easy_Ensemble_AdaBoost_Classifier

Credit Risk Analysis Challenge Summary:

Overview of the analysis: Explain the purpose of this analysis.

Results: Using bulleted lists, describe the balanced accuracy scores and the precision and recall scores of all six machine learning models. Use screenshots of your outputs to support your results

Summary: Summarize the results of the machine learning models, and include a recommendation on the model to use, if any. If you do not recommend any of the models, justify your reasoning.