Skip to content

An ETL project that builds a pipeline of crowdfunding projects using Python, Pandas, and SQL database. An ERD and a table schema is created for support.

License

Notifications You must be signed in to change notification settings

ericyang91/Crowdfunding_ETL

Repository files navigation

Crowdfunding ETL

main

Purpose:

The purpose of this project is to extract, transform, and load (hence, ETL) data related to crowdfunding using both Python and SQL. Pandas and Python are used to extract and transform raw data, and PostgreSQL and pgAdmin were used to load clean data that is ready for analysis.

Extract:

A quick snapshot of the crowdfunding.xlsx data:

rawdata2

A quick snapshot of the contacts.xlsx data:

rawdata1


Transform:


Python and Pandas were used to tranform the raw data to a clean data. The python code to this work can be found here.
A quick snapshot of the contacts_clean.csv data:

clean1

A quick snapshot of the category.csv data:

clean2

A quick snapshot of the subcategory.csv data:

clean3

A quick snapshot of the campaign.csv data:

clean4


Load:


QuickDBD was used to model the data into an Entity Relationship Diagram. The table schema for the Entity Relationship Diagram can be found here.

erd


Note:
One-to-one relationship: A straight line with a short, perpendicular line.
One-to-many relationship: A straight line with a short, perpendicular line with three short branches.


PostgreSQL and pgAdmin were used to create a database for the project and four tables corresponding to each of the cleaned up data using queries. These data were then imported using pgAdmin into the tables ready for use:

pgadmin


Below are the screenshots of the completed tables:

pg1


pg2


pg3


pg4


Languages and Libraries:

PostgreSQL pgAdmin Python Pandas Visual Studio Code ERD

About

An ETL project that builds a pipeline of crowdfunding projects using Python, Pandas, and SQL database. An ERD and a table schema is created for support.

Topics

Resources

License

Stars

Watchers

Forks