Classic Cars Pte Ltd is a retailer of toys specializing in scale car models. Currently, they are using an OLTP system in their day-to-day operations and recently, the CEO has decided to setup a data warehouse for analysis purpose, and he wants you to design a star/snowflake schema, setup the OLTP database and create a data warehouse that can generate meaningful insights in the areas of Sales, Staff/Offices, Seasons of Sales and Orders/Customer.
It consists of 8 tables:
- Offices
- Employees
- Orders
- Order Details
- Payments
- Products
- Product Lines
- Customers
-
Download the zipped files.
-
Click on Connect and right click on Databases to create a new schema called
CarSalesOLTP
to represent the OLTP database. -
Right click on Databases again to create another new schema called
CarSalesDW
to represent the data warehouse. -
Double click on
CarSalesOLTP
to make it the working database and run the SQL statements by clicking on Execute. -
Double click on
CarSalesDW
to make it the working database and run the SQL statements by clicking on Execute.
-
Download the zipped files.
-
Click on Clusters > Create New Cluster. Give it a meaningful name and select the default choice as the Databricks Runtime Version. Wait for about 3 minutes for the cluster to connect with the server.
-
Click on Workspace and right click again to create a new folder. Attach the folder to the previously created cluster.
Explanation for CarSalesDW.erd
:
The initial design was that of a star schema, however after encountering errors related to duplicated records in the customer and orders dimension tables, these two tables are identified as the ones that need to be primary dimension tables that have secondary dimension tables called CustomerPayment and OrderLine to insert data such as cheque payment and order line details respectively.
The participation constraints between the CustomerDIM to CustomerPaymentDIM is optional as not every customer owes the company payment. However, the participation constraints between the OrderDIM to OrderLineDIM is compulsory as every order will belong to a certain order line. The remaining 4 dimension tables are called Office, Employee, Product and Time and exist only as primary dimension tables.
Although the time taken to execute the SQL queries will be longer due to the use of more JOIN clauses, a snowflake schema is still more suitable when building a data warehouse as it prevent duplication errors in the long run.
-
Order of SQL for both OLTP database and Data Warehouse:
Create > Insert > Select
. -
For
Insert
to OLTP database, insert in the order:Customers.csv > Assignment_INSERT_data_partial_OLTP.sql > Products.json
. -
For
Insert
to Data Warehouse, insert in the order:Data Warehouse Create > Time Create > Data to Dimension Table > Data to Fact Table
. -
This project has no association to Classic Cars Pte Ltd in real life.