BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
-
Updated
Aug 6, 2021 - Jupyter Notebook
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Terraform module to create AWS EMR resources 🇺🇦
Run a Spark job within Amazon EMR
Shell scripts for AWS EMR clusters
EMR + Hadoop to Redshift ELT workflow using spark steps API and orchestrated by Apache-Airflow, which ingests disparate datasets focused around 7Gb of I94 arrivals information to produce a simple star schema in Redshift
Daily Incremental load ETL pipeline for Ecommerce company using AWS Lambda and AWS EMR cluster, Deployed using Apache airflow in a docker container.
Lambda to start EMR and run a map reduce job
Detect Tight Communities in a social Network
Load data from the Million Song Dataset into a final dimensional model stored in S3.
Performing various product review analysis on Amazon dataset using Apache Spark and MongoDB
Credit defaulting results in a large profit loss to banks and other credit lenders. The success of the banking industry results in the ability to understand risk. This project uses big data technologies like Mapreduce, HDFS along with PySpark and AWS for analysis of credit history and its prediction
Data Engineering Projects including Data Modeling, Data Warehouse, Data Lake Development
Analysis performed on data from the Steam platform using Apache Spark and Cloud services such as Amazon Web Services.
A Cloud based Reddit stock sentiment analyzer that analyzes overall sentiment from a configurable selection of stock subreddits for each stock. The architecture utilizes AWS MSK (Kafka), AWS EMR (PySpark) and AWS Lambda (Python 3) for maximum scalability and the OpenAI API for sentiment analysis through prompt engineering.
Stand-alone Scala & Java tool to anonymize OOXML Documents (DOCX)
PySpark RDD and DataFrame Examples
Data Pipeline Analytics Platform is an end-to-end generic Big Data pipeline. Involves following tech stack: AWS S3, AWS Redshift, AWS EMR Cluster, Apache Spark, Apache Airflow.
With this app, you can see what programming skills are most in-demand in the current job market.
Add a description, image, and links to the aws-emr-clusters topic page so that developers can more easily learn about it.
To associate your repository with the aws-emr-clusters topic, visit your repo's landing page and select "manage topics."