Personal loans are a lucrative revenue stream for banks. The typical interest rate of a two year loan in the United Kingdom is around 10%. This might not sound like a lot, but in September 2022 alone UK consumers borrowed around £1.5 billion, which would mean approximately £300 million in interest generated by banks over two years!
The Goal is to clean and store the data the bank collected as part of a recent marketing campaign, which aimed to get customers to take out a personal loan. They plan to conduct more marketing campaigns going forward so would like you to set up a MongoDB database to store this campaign's data, in a way that would allow data from future campaigns to be easily imported.
The result of the campaign is a csv file called "bank_marketing.csv"
, which needs to be cleaned, reformated, and splited, in order to save separate files based on the tables you will create.
I will write the code that the bank can execute to create the tables and populate with the data from the csv files. Also I will write tests for the functions to connect to the MongoDB database Its important to mention that this version connects to a local MongoDB enviroment. Its possible to connect to a server or even to the cloud aka Atlas using, for example, a enviroment variable to hide the password
The database must have three tables as the following schemas shows:
column | data type | description |
---|---|---|
id |
serial |
Client ID - primary key |
age |
integer |
Client's age in years |
job |
text |
Client's type of job |
marital |
text |
Client's marital status |
education |
text |
Client's level of education |
credit_default |
boolean |
Whether the client's credit is in default |
housing |
boolean |
Whether the client has an existing housing loan (mortgage) |
loan |
boolean |
Whether the client has an existing personal loan |
column | data type | description |
---|---|---|
campaign_id |
serial |
Campaign ID - primary key |
client_id |
serial |
Client ID - references id in the client table |
number_contacts |
integer |
Number of contact attempts to the client in the current campaign |
contact_duration |
integer |
Last contact duration in seconds |
pdays |
integer |
Number of days since contact in previous campaign (999 = not previously contacted) |
previous_campaign_contacts |
integer |
Number of contact attempts to the client in the previous campaign |
previous_outcome |
boolean |
Outcome of the previous campaign |
campaign_outcome |
boolean |
Outcome of the current campaign |
last_contact_date |
date |
Last date the client was contacted |
column | data type | description |
---|---|---|
client_id |
serial |
Client ID - references id in the client table |
emp_var_rate |
float |
Employment variation rate (quarterly indicator) |
cons_price_idx |
float |
Consumer price index (monthly indicator) |
euribor_three_months |
float |
Euro Interbank Offered Rate (euribor) three month rate (daily indicator) |
number_employed |
float |
Number of employees (quarterly indicator) |