Automation framework to catalog AWS data sources using Glue
-
Updated
May 24, 2024 - Python
Automation framework to catalog AWS data sources using Glue
This project repo 📺 offers a robust solution meticulously crafted to efficiently manage, process, and analyze YouTube video data leveraging the power of AWS services. Whether you're diving into structured statistics or exploring the nuances of trending key metrics, this pipeline is engineered to handle it all with finesse.
Unveiling job market trends with Scrapy and AWS
Creating an audit table for a DynamoDB table using CloudTrail, Kinesis Data Stream, Lambda, S3, Glue and Athena and CloudFormation
Working with Glue Data Catalog and Running the Glue Crawler On Demand
This repository contains a data pipeline that extracts, transforms and loads data from an AWS S3 bucket into an AWS Redshift table using AWS Glue. The raw data is made available in AWS S3 in its raw form and then the pipeline enables AWS Glue extract the raw data from S3 bucket.
Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog
Developed an ETL pipeline for real-time ingestion of stock market data from the stock-market-data-manage.onrender.com API. Engineered the system to store data in Parquet format for optimized query processing and incorporated data quality checks to ensure accuracy prior to visualization.
These are the handwritten notes on Coursera's Practical data science specialization course.
Smart City Realtime Data Engineering Project
Add a description, image, and links to the aws-glue-data-catalog topic page so that developers can more easily learn about it.
To associate your repository with the aws-glue-data-catalog topic, visit your repo's landing page and select "manage topics."