Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-18125][Azure] Skip e2e execution on docs-only-PRs #12708

Closed
wants to merge 2 commits into from

Conversation

rmetzger
Copy link
Contributor

What is the purpose of the change

Running an entire 4 hour E2E build for a documentation pull request is quite a resource waste.
If a the CI system detects a branch that looks like a PR branch, and the PR branch only touches docs/ files, then we won't run E2E tests.

We are still running E2E tests on regular master and release-* pushes, because these builds are monitored for build instabilities.

Brief change log

  • add a new script
  • modify e2e job definition

Verifying this change

Here's a CI run that is based on a branch that looks like a docs PR: https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8187&view=logs&j=1f3ed471-1849-5d3c-a34c-19792af4ad16

@rmetzger rmetzger changed the title [FLINK-18125][Azure] Check e2e execution on docs-only-PRs [FLINK-18125][Azure] Skip e2e execution on docs-only-PRs Jun 18, 2020
@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 19c1ecd (Thu Jun 18 15:03:24 UTC 2020)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!
  • This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Jun 18, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

return 1
fi
# check if it is docs only pull request:
CHANGES=`curl --silent "https://api.github.com/repos/apache/flink/pulls/$PR_ID/files" | jq -r ".[].filename"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can fail in surprising ways if the GitHub view of the machines that this script and ci-bot run on differ.
E.g., PR contains commit only changing docs, another commit is added that changes something else -> ci-bot mirrors PR -> the CI machines sees the old state, before the new commit was added.
This doesn't even seem that unlikely given that the machines are located in different continents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there's a small risk for such situations, but I believe it is unlikely enough that we can risk it (unless you know a simple solution).
For the scenario you are describing, even if the timing of ci-bot is optimal, there's probably a delay of at least 2 minutes between "commit has been added" and "azure checks list of files". (Factors adding delay: ci-bot does not permanently check for new changes; pushing the branch takes some time; azure doesn't immediately kick off the build; the build doesn't immediately check the list of files).

A fairly common scenario might be that the e2e build gets queued, and the "get list of files" happens hours later. But the worst that can happen here is that we are wasting 4 hours of CI time for a docs only change (because files were added later).

I believe this change is a "best effort" that saves us some CI time in 99% of the cases where it makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the worst that can happen here is that we are wasting 4 hours of CI time for a docs only change (because files were added later).

This is a case I also wouldn't care about; but it's not the worst case: a build that changes something outside docs/ does not get tested.

To solve this we just need to match the commit hashes.
https://api.github.com/repos/apache/flink/pulls/12708 contains the head commit of the PR, and number of commits.
Compare the hash against the head commit of the ci branch.
If they match, git diff --name-only head..head~${num_commits}, continue as usual.
If the hashes don't match, just run the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll do that.

@zentol zentol self-assigned this Jun 19, 2020
@rmetzger
Copy link
Contributor Author

rmetzger commented Jul 2, 2020

I updated the logic according to your comments. I tested the change again in my private GitHub / Azure account.

Copy link
Contributor

@zentol zentol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor things, otherwise +1

function is_docs_only_pullrequest() {
# check if it is a pull request branch, as generated by ci-bot:
if [[ ! $BUILD_SOURCEBRANCHNAME == ci_* ]] ; then
echo "INFO: Branch name mismatch";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could irritate other developers.

Suggested change
echo "INFO: Branch name mismatch";
echo "INFO: Not a pull request.";


if [[ $(git diff --name-only $GITHUB_PULL_HEAD_SHA..$GITHUB_PULL_HEAD_SHA~$GITHUB_NUM_COMMITS | grep -v "docs/") == "" ]] ; then
echo "INFO: This is a docs only change. Changed files:"
git diff --name-only $GITHUB_PULL_HEAD_SHA..$GITHUB_PULL_HEAD_SHA~$GITHUB_NUM_COMMITS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be possible, and would be a bit more readable.

Suggested change
git diff --name-only $GITHUB_PULL_HEAD_SHA..$GITHUB_PULL_HEAD_SHA~$GITHUB_NUM_COMMITS
git diff --name-only HEAD..HEAD~$GITHUB_NUM_COMMITS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow, it didn't work. I could not reproduce it locally, but I didn't want to spend more time trying this on CI, because setting up a matching PR is not that simple :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I used lowercase head. This isn't a problem, because the FS on OSX is case insensitive (WTF?), but it is a problem on our linux-based CI system.

# 3. Get number of commits in PR
GITHUB_NUM_COMMITS=`echo $GITHUB_PULL_DETAIL | jq -r ".commits"`

if [[ $(git diff --name-only $GITHUB_PULL_HEAD_SHA..$GITHUB_PULL_HEAD_SHA~$GITHUB_NUM_COMMITS | grep -v "docs/") == "" ]] ; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if [[ $(git diff --name-only $GITHUB_PULL_HEAD_SHA..$GITHUB_PULL_HEAD_SHA~$GITHUB_NUM_COMMITS | grep -v "docs/") == "" ]] ; then
if [[ $(git diff --name-only HEAD..HEAD~$GITHUB_NUM_COMMITS | grep -v "docs/") == "" ]] ; then

@rmetzger
Copy link
Contributor Author

rmetzger commented Jul 2, 2020

Thanks a lot for your review. Merging.

@rmetzger rmetzger closed this in 76b35b8 Jul 2, 2020
fsk119 pushed a commit to fsk119/flink that referenced this pull request Jul 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants