Name		Name	Last commit message	Last commit date
parent directory ..
.gitignore		.gitignore
Pulumi.yaml		Pulumi.yaml
README.md		README.md
__main__.py		__main__.py
events-1.txt		events-1.txt
glue-job.py		glue-job.py
requirements.txt		requirements.txt

README.md

ETL pipeline with Amazon Redshift and AWS Glue

This example creates an ETL pipeline using Amazon Redshift and AWS Glue. The pipeline extracts data from an S3 bucket with a Glue crawler, transforms it with a Python script wrapped in a Glue job, and loads it into a Redshift database deployed in a VPC.

Prerequisites

Deploying the App

Clone this repo, change to this directory, then create a new stack for the project:
```
pulumi stack init
```
Specify an AWS region to deploy into:
```
pulumi config set aws:region us-west-2
```

Install Python dependencies and run Pulumi:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

pulumi up

In a few moments, the Redshift cluster and Glue components will be up and running and the S3 bucket name emitted as a Pulumi stack output.
```
...
Outputs:
    dataBucketName: "events-56e424a"
```
Upload the included sample data file to S3 to verify the automation works as expected:
```
aws s3 cp events-1.txt s3:https://$(pulumi stack output dataBucketName)
```
When you're ready, destroy your stack and remove it:
```
pulumi destroy --yes
pulumi stack rm --yes
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws-py-redshift-glue-etl

aws-py-redshift-glue-etl

README.md

ETL pipeline with Amazon Redshift and AWS Glue

Prerequisites

Deploying the App

Files

aws-py-redshift-glue-etl

Directory actions

More options

Directory actions

More options

Latest commit

History

aws-py-redshift-glue-etl

Folders and files

parent directory

README.md

ETL pipeline with Amazon Redshift and AWS Glue

Prerequisites

Deploying the App