DSupps / Credit_Risk_Analysis Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Build and evaluate several machine learning algorithms to predict credit risk.

0 stars 0 forks Branches Tags Activity

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LoanStats_2019Q1.csv		LoanStats_2019Q1.csv
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Repository files navigation

Credit_Risk_Analysis

Credit Risk Analysis Challenge Overview:

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans.

Using a credit card credit dataset and Python, several machine learning modules will be used to evaluate and predict credit risk.

Techniques used to predit credit risk:

Oversample the data using the RandomOverSampler and SMOTE algorithms.
Undersample the data using the ClusterCentroids algorithm.
A combinatorial approach of over- and undersampling using the SMOTEENN algorithm.
Compare two new machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk.

Once these models have been completed, their performance will be evaluated and a written recommendation will be made on whether they should be used to predict credit risk

Deliverables:

Deliverable 1: Use Resampling Models to Predict Credit Risk
Deliverable 2: Use the SMOTEENN Algorithm to Predict Credit Risk
Deliverable 3: Use Ensemble Classifiers to Predict Credit Risk
Deliverable 4: A Written Report on the Credit Risk Analysis

Resources:

Data Sources: - LoanStats_2019Q1.csv

Software: - Jupyter Notebook 6.1.4 - Python 3.8.5

Credit Risk Analysis Challenge Results:

Deliverable 1 Results: Use Resampling Models to Predict Credit Risk

Oversampling RandomOverSampler Model:

Accuracy Score for the RandomOverSampler model is 63%
The precision for the high-risk is 1% and F1 score is 2%, which are not good enough to state that the model will be good at classifying.

SMOTE Oversampling Model:

The accuracy score of the SMOTE model is a little bit better than the RandomOverSampler.
The precision for the high-risk is very low at 1%, indicating a large number of false positives, which indicates an unreliable classification.
The F1 score is 2% which also very low.

Undersampling ClusterCentroids Model:

Deliverable 2 Results: Use the SMOTEENN Algorithm to Predict Credit Risk

Combination Sampling SMOTEENN Model:

Deliverable 3 Results: Use Ensemble Classifiers to Predict Credit Risk

Balanced Random Forest Classifier Model:

Easy Ensemble AdaBoost Classifier Model:

Credit Risk Analysis Challenge Summary:

Overview of the analysis: Explain the purpose of this analysis.

Results: Using bulleted lists, describe the balanced accuracy scores and the precision and recall scores of all six machine learning models. Use screenshots of your outputs to support your results

Summary: Summarize the results of the machine learning models, and include a recommendation on the model to use, if any. If you do not recommend any of the models, justify your reasoning.