This project contains three example pipelines that demonstrate some of the capabilities of Apache Beam.
Please follow the steps below to run the example:
- Configure
gcloud
with your credentials - Enable Cloud Dataflow API in your Google Cloud Platform project
Run the following command to execute the batch pipeline:
python -m complete.batch_pipeline.batch_pipeline \
--input gs:https://[DATA FILE BUCKET]/users.csv \
--output [PROJECT ID]:beam.users \
--temp_location gs:https://[DATAFLOW STAGING BUCKET]/temp/ \
--staging_location gs:https://[DATAFLOW STAGING BUCKET]/stage/ \
--project [PROJECT ID] \
--runner DataflowRunner
Run the following command to execute the sum pipeline:
python -m template.sum_pipeline.sum_pipeline \
--input gs:https://[DATA FILE BUCKET]/retail.csv \
--output [PROJECT ID]:beam.retail \
--temp_location gs:https://[DATAFLOW STAGING BUCKET]/temp/ \
--staging_location gs:https://[DATAFLOW STAGING BUCKET]/stage/ \
--project [PROJECT ID] \
--runner DataflowRunner
Run the following command to execute the streaming pipeline:
python -m template.streaming_pipeline.streaming_pipeline \
--input projects/[PROJECT ID]/topics/[TOPIC NAME] \
--output [PROJECT ID]:beam.streaming_sum \
--temp_location gs:https://[DATAFLOW STAGING BUCKET]/temp/ \
--staging_location gs:https://[DATAFLOW STAGING BUCKET]/stage/ \
--project [PROJECT ID] \
--runner DataflowRunner \
--streaming