This project is a data analytics study of demographics data for customers of a mail-order sales campaign in Germany. Data cleaning, preprocessing, transformation, unsupervised and supervised learning techniques were applied for this study. Unsupervised learning is used for customer segmentation purpose, and supervised modeling is used to predict the likelihood of individuals becoming customers for the mail-order campaign.
Data Files:
./data/Udacity_AZDIAS_052018.csv
./data/Udacity_CUSTOMERS_052018.csv
./data/Udacity_MAILOUT_052018_TRAIN.csv
./data/Udacity_MAILOUT_052018_TEST.csv
./data/AZDIAS_Attributes_Info.csv
In addition to the data files, the project workspace includes five files:Arvato_Project_workbook.ipynb
is the jupyter notebook which documents the project steps.DIAS Information Levels - Attributes 2017.xlsx
contains Attribute description for the four main data files.DIAS Attributes - Values 2017.xlsx
Attribute value explanation for each attribute, and it could help for data preprocessing steps.Project_final_report.pdf
.README.md
provides instructions on the project
numpy
pandas
matplotlib
seaborn
scikit-learn
joblib
- Udacity+Arvato: Identify Customer Segments
- I was ranked the 42nd spot (public score: 0.80153, the 1st place score: 0.81063)
I would like to thank Udacity for this project, and Arvato for providing the dataset.