Bank_Customer_Categorization

Historical data were gathered from bank customers to determine whether a customer is a good or bad credit risk for a home equity loan. Bad risk customers are more likely to default on the loan.

Packages: pandas, numpy, sklearn, matplotlib, seaborn.

Preprocess

First, I converted the target variable type to an object.
I detected missing values

I filled the missing values with average and most frequent values.

EDA

I looked at the relationship of categorical variables with target variable.
I grouped the target value according to the mean of the numerical values.Then I separated the different ones.
Shapiro-wilks test was performed to check normality assumption.(All Non normal distributed).
Then I grouped them and translated them into a categorical variable. Thus, I was able to test variables that we know to affect the target variable.
I paid attention to the homogeneous distribution of the data and the meaning of the variable.
I did a chi-square test on the variables I separated and all of them turned out to be related to the target variable.

If the number of delaying monthly payments is greater than 5, he / she is put in the bad risk group. The number of delaying payments per month is 5 and the value of properties of customers who are identified as good risk is different from others.

Model Building

First, I transformed the categorical variables into dummy variables. I also split the data into train and tests sets with a test size of 20%.

I tried three different models and evaluated them using Accuracy and F1 score.

Only debtinc_score model

I achieved 87% accuracy in the logistic regression model that I created with only debt-income ratio.
So if we know only the debt-income ratio of a new customer, we can find out that this customer is either a good risk or a bad risk with 87% accuracy.

Deptinc_score stands out in the feature importance chart.

Model performance

When examining the model performance, F-score (1) (bad risk) was considered. We want to find the bad risk client.
XGB model outperformed the other approaches.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
images		images
Bank_Customer_Categorization.ipynb		Bank_Customer_Categorization.ipynb
README.md		README.md
hmelq.csv		hmelq.csv
impme.py		impme.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bank_Customer_Categorization

Preprocess

EDA

Model Building

Only debtinc_score model

Model performance

About

Releases

Packages

Languages

tanerant/Bank_Customer_Categorization

Folders and files

Latest commit

History

Repository files navigation

Bank_Customer_Categorization

Preprocess

EDA

Model Building

Only debtinc_score model

Model performance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages