This example creates an ETL pipeline using Amazon Redshift and AWS Glue. The pipeline extracts data from an S3 bucket with a Glue crawler, transforms it with a Python script wrapped in a Glue job, and loads it into a Redshift database deployed in a VPC.
- Install Pulumi.
- Install Node.js.
- Configure your AWS credentials.
-
Clone this repo, change to this directory, then create a new stack for the project:
pulumi stack init
-
Specify an AWS region to deploy into:
pulumi config set aws:region us-west-2
-
Install Node dependencies and run Pulumi:
npm install pulumi up
-
In a few moments, the Redshift cluster and Glue components will be up and running and the S3 bucket name emitted as a Pulumi stack output.
... Outputs: dataBucketName: "events-56e424a"
-
Upload the included sample data file to S3 to verify the automation works as expected:
aws s3 cp events-1.txt s3:https://$(pulumi stack output dataBucketName)
-
When you're ready, destroy your stack and remove it:
pulumi destroy --yes pulumi stack rm --yes