Skip to content

JirapatJean/beginner_de_project

 
 

Repository files navigation

beginner-de-project

This repository serves as my learning material for developing Data Engineering (DE) skills, focusing on Airflow, Docker, AWS cloud computing, and data pipeline best practices.

Credit goes to @josephmachado for initially setting up this project. You can find his repository here.

Status

Ongoing

Struggle

Encountered issues with Airflow connectivity after running make infra-up.

What I Have Done and Learned

  1. Setting up WSL and Ubuntu on Windows:

    • Utilized Microsoft's guide for setting up WSL.
    • Installed Docker Desktop for Windows following the instructions provided here.
  2. Setting up AWS Account and AWS CLI:

  3. AWS SSO Configuration:

    • Configured AWS SSO as a separate profile instead of using the default one.
    • Encountered issues with Terraform due to this setup.
    • After setting up SSO, it's necessary to change the profile in main.tf to your SSO name.
    • Also, commented out sections 'sso_session = 'name'' and '[sso-session name]' in ~/.aws/config.
    • Refer to this issue for more details.
  4. EMR Default Role Setup:

    • Encountered a missing default role issue with EMR.
    • Resolved it using the command line: aws emr create-default-roles in the AWS console cloudshell.
  5. S3 ACL Error Resolution:

    • Faced S3 ACL errors during deployment.
    • To resolve the issue, first, enabled block public access for the AWS account. S3 block public access S3 block public access
    • Then, enabled ACLs block access in newly created S3 in main.tf
    • Followed the steps outlined here to resolve the issue.
    • Allowed S3 bucket ACLs public access block in Terraform to resolve this error:
      Error: error creating S3 bucket ACL for s3-bucket-name-1234: AccessDenied: Access Denied
      

Releases

No releases published

Packages

No packages published

Languages

  • HCL 40.9%
  • Python 37.1%
  • Makefile 15.8%
  • Shell 5.8%
  • Dockerfile 0.4%