This project focuses on analyzing the data from a superstore. It consists of four main sections:
-
Data Warehouse Preparation:
- Preprocess and clean the data.
- Design the data warehouse structure with Fact and Dimension tables.
- Create keys and relationships between tables.
- Import tables into Power BI and ensure correct relationships.
-
Statistical Analysis:
- Investigate the impact of discounts on sales.
- Divide the data into discount and non-discount groups.
- Analyze the distribution of sold items in each group.
- Perform statistical tests to determine if there is a significant difference between the groups.
-
Machine Learning:
-
Task 1: Profit Estimation
- Preprocess the data and select relevant features.
- Train a model to predict the profit of a sold product.
- Ensure the model is generalizable and avoids overfitting.
-
Task 2: Shipping Mode Prediction
- Preprocess the data and select appropriate features.
- Train a model to predict the shipping mode for each order.
- Evaluate the model's performance.
-
-
Dashboard Creation:
- Import the preprocessed data into Power BI.
- Design and create a dashboard with meaningful visualizations.
- Provide recommendations for improving the business based on the analysis.
Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request.
- Special thanks to Power BI for data visualization and analysis capabilities.
- The scikit-learn library was used for machine learning tasks.
- Sina Asghari and ArsalanMoravvej, [Sina Saberi], Mohammad Norasteh for their contributions and efforts in developing this project.