Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-9650] Cleanup documentation on side inputs patterns #11415

Merged
merged 5 commits into from
Apr 17, 2020

Conversation

Ardagan
Copy link
Contributor

@Ardagan Ardagan commented Apr 14, 2020

[Website staging preview]

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@Ardagan
Copy link
Contributor Author

Ardagan commented Apr 14, 2020

R: @soyrice @rosetn

You can read side input pcollection periodically into distinct windows.
Later, when you apply side input to your main input, windows will be matched automatically 1:1.
This way, you can guarantee side input consistency on the duration of the single window.
You can read side input data periodically into distinct PCollection windows.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Beam, we have IO.Read transforms that do read source data. Later data read can be assigned to specific window.

Copy link
Contributor

@soyrice soyrice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR :)

website/src/documentation/patterns/side-inputs.md Outdated Show resolved Hide resolved
website/src/documentation/patterns/side-inputs.md Outdated Show resolved Hide resolved
meaning that each window on main input side will be matched to a single
version of side input data.

To do this, you can utilize a combination of PeriodicSequence/PeriodicImpulse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably best to replace "To do this" with "To do ABC" - it'll help the reader figure out what "this" refers to. Otherwise, if the reader is just skimming the page and starts at this paragraph, they have to read the previous paragraph to figure out what the pronoun refers to.

website/src/documentation/patterns/side-inputs.md Outdated Show resolved Hide resolved
website/src/documentation/patterns/side-inputs.md Outdated Show resolved Hide resolved
website/src/documentation/patterns/side-inputs.md Outdated Show resolved Hide resolved
website/src/documentation/patterns/side-inputs.md Outdated Show resolved Hide resolved
@Ardagan Ardagan requested a review from pabloem April 14, 2020 21:32
Copy link
Contributor

@soyrice soyrice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a couple more small comments to clear up a few parts of the doc.

website/src/documentation/patterns/side-inputs.md Outdated Show resolved Hide resolved

To do this, you can utilize PeriodicSequence PTransform that will generate infinite sequence
of elements with some real-time period:
Described approach can be implemented using combination of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like this: "To read side input data periodically... use a combination of PeriodicSequence..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here since adding text you suggest is pretty much re-writing same thing that is a header and previous abstract. In next iteration I tried to remove this abstract overall since list of steps provides same information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha - the intent is to summarize the previous abstract, so that it's easier for readers to skim the doc.

To do this, you can utilize PeriodicSequence PTransform that will generate infinite sequence
of elements with some real-time period:
Described approach can be implemented using combination of
PeriodicSequence or PeriodicImpulse PTransforms and SDF Read or ReadAll
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDF Read and ReadAll are transforms too, right? The following will help clarify what the combination consists of: "...PeriodicSequence or PeriodicImpulse PTransforms and SDF Read or ReadAll PTransforms"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but that's exactly the phrasing. "PTransforms" word is on next line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably just missed the word. Looks good in staging.

a. MAX_TIMESTAMP can be replaced with some closer boundary if you want to stop generating elements at some point.
1. Use the PeriodicImpulse or PeriodicSequence PTransform to generate an
infinite sequence of elements at required processing time intervals and assign
them to separate windows.

1. Read data using Read operation triggered by arrival of PCollection element.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you marked the previous comment here as resolved but I don't see any changes to this line. Was that intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not intended. FIxed.

Copy link
Contributor

@soyrice soyrice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text LGTM after resolving the remaining comments, but I don't see the code sample on the staging server.

meaning that each window on the main input will be matched to a single
version of side input data.

Implementation of described approach can be narrowed down to:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rephrase as "To read side input data periodically:"

The goal is to summarize the previous paragraph, so that it's clear what these steps refer to without needing to read through the previous paragraph. The previous paragraph offers additional context, but if a reader skims the page, they should be able to quickly figure out what these steps are about.

intervals
* Assign them to separate windows.
1. Fetch data using SDF Read or ReadAll PTransform triggered by arrival of
PCollection element.
1. Apply side input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing "the" -> "Apply the side input"

@soyrice
Copy link
Contributor

soyrice commented Apr 15, 2020

The text LGTM after resolving the remaining comments, but I don't see the code sample on the staging server.

I think it's missing because it's not picking up the language selector.

@Ardagan
Copy link
Contributor Author

Ardagan commented Apr 15, 2020

Staging job failing together with website publish jobs :( So staging version has old code unfortunately.


a. MAX_TIMESTAMP can be replaced with some closer boundary if you want to stop generating elements at some point.
You can read side input data periodically into distinct PCollection windows.
Later, when you apply the side input to your main input, each main input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove "Later" because this is a part of the overall workflow described in the previous sentence, rather than a secondary step.

So this should be: "When you apply the side input to your main input..."

@soyrice
Copy link
Contributor

soyrice commented Apr 15, 2020

LGTM (modulo one last comment). Thanks again for opening this PR and doing an editorial review

@Ardagan Ardagan merged commit 892a0a4 into apache:master Apr 17, 2020
@Ardagan Ardagan deleted the SCPPyDocs branch April 17, 2020 18:37
yirutang pushed a commit to yirutang/beam that referenced this pull request Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants