Machine Learning Driven Credit Risk Analysis

Utilizing Machine Learning to Analyze and Assess Credit Risk

Goals • Dataset • Tools Used • Results • Summary

Goals

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, we'll need to employ different techniques to train and evaluate models with unbalanced classes. We'll use imbalanced-learn and scikit-learn libraries to build and evaluate models using resampling.

We'll oversample the data using the RandomOverSampler and SMOTE algorithms, and undersample the data using the ClusterCentroids algorithm. Then, use a combinatorial approach of over and undersampling using the SMOTEENN algorithm. Next, we'll compare two new machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk. The purpose is to evaluate the performance of these models and make a written recommendation on whether they should be used to predict credit risk.

Dataset

Using the credit card credit dataset from LendingClub, a peer-to-peer lending services company, we'll process a CSV to use for training and testing data.

Loan Stats Q1 2019: CSV file containing 115,677 rows of borrower data

Tools Used

Python: Programming language used to build app to automate election audit
- Imbalanced Learn: Python library containing tools for dealing with classification with imbalanced classes
- Scikit Learn: Python library with classification, regression and clustering algorithms
- Numpy: Open source Python library used for advanced scientific computing
- SciPy: Open source Python library used for scientific computing and technical computing

Results

- Random Oversampler

- SMOTE Oversampler

- ClusterCentroids Undersampler

- SMOTEENN Combination (Over and Under) Sampling

- Balanced Random Forest Classifier

- Easy Ensemble Classifier

Summary

After reviewing all of the samples, we recommend the Easy Ensemble Classifier to run the credit risk analysis. The algorithm yielded the highest balanced accuracy at 92%. Given the sensitive nature of assessing risk for lenders, we want the model that yields the most accurate results.

Back to top

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
images		images
.gitignore		.gitignore
Colab_Credit_Risk_Ensemble.ipynb		Colab_Credit_Risk_Ensemble.ipynb
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Driven Credit Risk Analysis

Utilizing Machine Learning to Analyze and Assess Credit Risk

Goals

Dataset

Tools Used

Results

- Random Oversampler

- SMOTE Oversampler

- ClusterCentroids Undersampler

- SMOTEENN Combination (Over and Under) Sampling

- Balanced Random Forest Classifier

- Easy Ensemble Classifier

Summary

About

Releases

Packages

Languages

rivas-j/Credit_Risk_Analysis-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Driven Credit Risk Analysis

Utilizing Machine Learning to Analyze and Assess Credit Risk

Goals

Dataset

Tools Used

Results

- Random Oversampler

- SMOTE Oversampler

- ClusterCentroids Undersampler

- SMOTEENN Combination (Over and Under) Sampling

- Balanced Random Forest Classifier

- Easy Ensemble Classifier

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages