Skip to content
View lilyontherocks's full-sized avatar
🌺
🌺
Block or Report

Block or report lilyontherocks

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
lilyontherocks/README.md

Clustering Linda Z:

This project is a data analytics clustering project that uses Python libraries and aims to explore the characteristics of different clusters and provide insights for marketing strategies. Data Source The data source is a CSV file that contains information about 2000 customers, such as their age, gender, annual income, spending score, and segment. The data was obtained from Kaggle: Mall Customer Segmentation Data.

Data Analysis

The data analysis process consists of the following steps:

  1. Data cleaning: check for missing values, outliers, and duplicates, and handle them accordingly.
  2. Data exploration: perform descriptive statistics and visualizations to understand the distribution and relationship of the variables.
  3. Data preprocessing: scale the numerical variables and encode the categorical variables for clustering.
  4. Clustering: apply K-Means clustering algorithm to find the optimal number of clusters using the elbow method and silhouette score, and assign each customer to a cluster.
  5. Cluster interpretation: analyze the characteristics of each cluster and provide insights for marketing strategies.
  6. Prediction: using classifiers such as KNN, Random Forest, Decision Tree(J48)
  7. Analysis: Confusion matrix, Accuracy score, Recall, Precision, RMSE(of mean squared error).

Libraries Used

The project uses the following Python libraries: β€’ pandas: for data manipulation and analysis β€’ numpy: for numerical computation β€’ matplotlib: for data visualization β€’ seaborn: for data visualization β€’ sklearn: for data preprocessing and clustering

How to Run

To run the project, you need to have Python 3 and the above-mentioned libraries installed. You can use any Python IDE or notebook environment, such as Jupyter Notebook, to open and run the Clustering_Linda_Z.ipynb file. Alternatively, you can clone or download the GitHub repository and run the file from your local machine.

Popular repositories Loading

  1. lilyontherocks lilyontherocks Public

    Data analytics clustering project

    Jupyter Notebook