This repository serves as my learning material for developing Data Engineering (DE) skills, focusing on Airflow, Docker, AWS cloud computing, and data pipeline best practices.
Credit goes to @josephmachado for initially setting up this project. You can find his repository here.
Ongoing
Encountered issues with Airflow connectivity after running make infra-up
.
-
Setting up WSL and Ubuntu on Windows:
- Utilized Microsoft's guide for setting up WSL.
- Installed Docker Desktop for Windows following the instructions provided here.
-
Setting up AWS Account and AWS CLI:
- Followed the steps outlined in the AWS documentation's "Getting Started" guide, focusing on module three.
-
AWS SSO Configuration:
- Configured AWS SSO as a separate profile instead of using the default one.
- Encountered issues with Terraform due to this setup.
- After setting up SSO, it's necessary to change the profile in
main.tf
to your SSO name. - Also, commented out sections 'sso_session = 'name'' and '[sso-session name]' in
~/.aws/config
. - Refer to this issue for more details.
-
EMR Default Role Setup:
- Encountered a missing default role issue with EMR.
- Resolved it using the command line:
aws emr create-default-roles
in the AWS console cloudshell.
-
S3 ACL Error Resolution:
- Faced S3 ACL errors during deployment.
- To resolve the issue, first, enabled block public access for the AWS account.
- Then, enabled ACLs block access in newly created S3 in main.tf
- Followed the steps outlined here to resolve the issue.
- Allowed S3 bucket ACLs public access block in Terraform to resolve this error:
Error: error creating S3 bucket ACL for s3-bucket-name-1234: AccessDenied: Access Denied