This example creates an ETL pipeline using Amazon Redshift and AWS Glue. The pipeline extracts data from an S3 bucket with a Glue crawler, transforms it with a Python script wrapped in a Glue job, and loads it into a Redshift database deployed in a VPC.
- Install Pulumi.
- Install Python.
- Configure your AWS credentials.
-
Clone this repo, change to this directory, then create a new stack for the project:
pulumi stack init
-
Specify an AWS region to deploy into:
pulumi config set aws:region us-west-2
-
Install Python dependencies and run Pulumi:
python3 -m venv venv source venv/bin/activate pip install -r requirements.txt pulumi up
-
In a few moments, the Redshift cluster and Glue components will be up and running and the S3 bucket name emitted as a Pulumi stack output.
... Outputs: dataBucketName: "events-56e424a"
-
Upload the included sample data file to S3 to verify the automation works as expected:
aws s3 cp events-1.txt s3:https://$(pulumi stack output dataBucketName)
-
When you're ready, destroy your stack and remove it:
pulumi destroy --yes pulumi stack rm --yes