Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE: Minimal repro for "trap breaks retry" problem #101

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lucaswilric
Copy link
Contributor

  • Creates a new function busted_login, which always fails
  • Calls busted_login instead of login on execution
  • Creates a repro script, env-hook-repro.sh, which simulates the behaviour of the Elastic Stack's agent environment hook (the real thing is here)

Run the repro script to see how the trap command breaks the retry function defined in the environment hook.

We expect:

Expected execution log
$ ./env-hook-repro.sh
Retries: 3
Trying ECR login with max 4 attempts
+ ((  attempt_num <= max_attempts  ))
+ set +e
+ ./fail.sh
./hooks/environment: line 71: ./fail.sh: No such file or directory
+ exit_code=127
+ set -e
+ [[ 3 -eq 0 ]]
+ [[ 127 -eq 0 ]]
+ ((  attempt_num == max_attempts  ))
+ echo 'Login failed on attempt 1 of 4. Trying again in 1 seconds...'
Login failed on attempt 1 of 4. Trying again in 1 seconds...
+ sleep 1
+ ((  attempt_num <= max_attempts  ))
+ set +e
+ ./fail.sh
./hooks/environment: line 71: ./fail.sh: No such file or directory
+ exit_code=127
+ set -e
+ [[ 3 -eq 0 ]]
+ [[ 127 -eq 0 ]]
+ ((  attempt_num == max_attempts  ))
+ echo 'Login failed on attempt 2 of 4. Trying again in 2 seconds...'
Login failed on attempt 2 of 4. Trying again in 2 seconds...
+ sleep 2
+ ((  attempt_num <= max_attempts  ))
+ set +e
+ ./fail.sh
./hooks/environment: line 71: ./fail.sh: No such file or directory
+ exit_code=127
+ set -e
+ [[ 3 -eq 0 ]]
+ [[ 127 -eq 0 ]]
+ ((  attempt_num == max_attempts  ))
+ echo 'Login failed on attempt 3 of 4. Trying again in 3 seconds...'
Login failed on attempt 3 of 4. Trying again in 3 seconds...
+ sleep 3
+ ((  attempt_num <= max_attempts  ))
+ set +e
+ ./fail.sh
./hooks/environment: line 71: ./fail.sh: No such file or directory
+ exit_code=127
+ set -e
+ [[ 3 -eq 0 ]]
+ [[ 127 -eq 0 ]]
+ ((  attempt_num == max_attempts  ))
+ echo 'Login failed after 4 attempts'
Login failed after 4 attempts
+ return 127
^^^ +++
:alert: Elastic CI Stack environment hook failed

$ echo $?
53

We see:

Actual execution log
$ ./env-hook-repro.sh  
Retries: 3
Trying ECR login with max 4 attempts
++ ((  attempt_num <= max_attempts  ))
++ set +e
++ ./fail.sh
hooks/environment: line 71: ./fail.sh: No such file or directory
+++ handle_err
+++ echo '^^^ +++'
^^^ +++
+++ echo ':alert: Elastic CI Stack environment hook failed'
:alert: Elastic CI Stack environment hook failed
+++ exit 53
^^^ +++
:alert: Elastic CI Stack environment hook failed
$ echo $?
53

lucaswilric added a commit to buildkite/elastic-ci-stack-for-aws that referenced this pull request Mar 4, 2024
When we introduced the `environment` hook error trap in #1179, we set it to
propagate into all `source`-d plugin environment hooks. Unfortunately, this
breaks the `retry` function in the ECR plugin's environment hook (which runs
a command and then checks the exit code to determine whether to retry) by
interrupting execution before the exit code can be checked.

Repro: buildkite-plugins/ecr-buildkite-plugin#101

Fix by temporarily unsetting `-E`, stopping propagation of the trap command
into the plugin environment hook.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant