Skip to content

ZhengyuOfficial/Road2DE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 

Repository files navigation

Road2DE

Data Engineering Projects

This project is used to document the process of learning DE. I found this valuable project while searching something about DE, so I want to finish it all by myself from scratch!Also I've made changes to minor parts, like some commands not working due to updates, etc.

The original github link is [https://github.com/san089/Udacity-Data-Engineering-Projects/tree/master]

Project 1: ETL Pipeline Using Python & Postgres

In this project, we apply Data Modeling with Postgres and build an ETL pipeline using Python. A startup wants to analyze the data they've been collecting on songs and user activity on their new music streaming app. Currently, they are collecting data in json format and the analytics team is particularly interested in understanding what songs users are listening to.

Link:[https://github.com/ZhengyuOfficial/The-Road-To-Data-Engineering./tree/main/ETL]

Project 2: Data Warehouse

In this project, we apply the Data Warehouse architectures we learnt and build a Data Warehouse on AWS cloud. We build an ETL pipeline to extract and transform data stored in json format in s3 buckets and move the data to Warehouse hosted on Amazon Redshift.

Link:[https://github.com/ZhengyuOfficial/Road2DE/tree/main/WareHouse]

Project 3: Data Lake(Pending)

In this project, we will build a Data Lake on AWS cloud using Spark and AWS EMR cluster. The data lake will serve as a Single Source of Truth for the Analytics Platform. We will write spark jobs to perform ELT operations that picks data from landing zone on S3 and transform and stores data on the S3 processed zone.

Project 4: Data Pipelines with Airflow(Pending)

In this project, we will orchestrate our Data Pipeline workflow using an open-source Apache project called Apache Airflow. We will schedule our ETL jobs in Airflow, create project related custom plugins and operators and automate the pipeline execution.

Project 5: Api Data to Postgres(Pending)

In this project, we build an etl pipeline to fetch data from yelp API and insert it into the Postgres Database. This project is a very basic example of fetching real time data from an open source API.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published