Integration test improvements #2858

hubertdeng123 · 2024-03-05T18:48:05Z

This PR adds a couple of improvements to our integration test pipeline.

Parallelizes some tests, runtime was 23+ minutes. Now this is decreased to <18 minutes
Reports failures to Sentry project

In order to report flakes, I wanted to cut down the time it takes to actually run these tests. Given that with 20 min tests, it could take an hour if the tests flake twice.

BYK · 2024-03-05T18:57:43Z

I don't understand how this is going to help. Can you share your thinking?

hubertdeng123 · 2024-03-05T19:06:40Z

I don't understand how this is going to help. Can you share your thinking?

Please ignore things going on in this branch for now, I am just using this as a means to run CI while I am experimenting

hubertdeng123 · 2024-03-06T00:34:34Z

_integration-test/run.sh

@@ -41,10 +45,10 @@ echo "${_endgroup}"
 echo "${_group}Starting Sentry for tests ..."
 # Disable beacon for e2e tests
 echo 'SENTRY_BEACON=False' >>$SENTRY_CONFIG_PY
-echo y | $dcr web createuser --force-update --superuser --email $TEST_USER --password $TEST_PASS


docker run is slower than just an exec into a running container

It is but it is also intentional to keep the run separate from the main web process.

That said for a one-off thing like this, I think using exec is a good compromise if it saves a notable amount of time. I'd just want it documented with a brief comment.

hubertdeng123 · 2024-03-06T23:16:35Z

integration test (v2.0.1, /usr/local/lib/docker/cli-plugins) and integration test (v2.7.0, /usr/local/lib/docker/cli-plugins) are expected and pending because of branch protection rules, if this PR goes in I'll change those

azaslavsky · 2024-03-06T23:21:20Z

.github/workflows/test.yml

+ uses: nick-fields/retry@v3
+ with:
+ max_attempts: 3
+ timeout_minutes: 10


Any chance this could actually increase the flake rate, due to timeouts becoming more frequent? Or are flakes caused by the actions itself timing out?

The typical install logic takes around 4-5 minutes. I doubled that time for the timeout so I don't think this should ever increase the flake rate.

azaslavsky · 2024-03-06T23:22:20Z

.github/workflows/test.yml

- run: ./integration-test.sh
+ uses: nick-fields/retry@v3
+ with:
+ max_attempts: 3


Just so I understand: what was the behavior before you added these settings? Unlimited retries? No timeout so it hung until the action crashed?

The tests would fail from flakes way too often, adding this in drastically increases the chance the tests pass when they should

So, in effect, the previous (implicit) setting was max_attempts: 1? Or timeout_minutes: Infinity? Or some combination of the two? I get that the problem we are trying to solve is flakiness, I'm just not clear how changing (raising? lowering?) max_attempts and timeout_minutes helps that, since it's not obvious what the current state of affairs is.

Yep, that's correct. The max_attempts really is just the first step into adding flaky test detection. If a job fails, but then is retried and succeed, it can be marked as flaky. The timeout_minutes is a required parameter here. I can remove this and readd in a follow-up PR if that is more clear

No, that's fine. I just wanted to understand the change. LGTM!

azaslavsky · 2024-03-06T23:24:08Z

_integration-test/run.sh

+ error_msg="An error occurred, caught SIG$1 on line $2"
+ echo "$error_msg"
+ dsn="https://[email protected]/6627632"
+ local sentry_cli="docker run --rm -v $(pwd):/work -e SENTRY_DSN=$dsn getsentry/sentry-cli"


Odd to me that we set REPORT_ISSUES=0 above, but then send an envelope here anyway? Maybe I am misunderstanding something, but why would we not just do REPORT_ISSUES=1, and maybe figure out some way to configure it to send to an "integration tests" or similar project on our actual dogfood instance?

Sure, I can incorporate this into the existing logic for REPORT_ISSUES I believe.

azaslavsky · 2024-03-06T23:26:23Z

_integration-test/run.sh

 timeout 90 bash -c 'until $(curl -Isf -o /dev/null $SENTRY_TEST_HOST); do printf '.'; sleep 0.5; done'
+echo y | $dc exec web sentry createuser --force-update --superuser --email $TEST_USER --password $TEST_PASS


What happens if we exec into the web container but it isn't ready yet? Does docker wait until it is ready, or does this fail? If it's the former, we should echo "Waiting for Sentry..." before this runs, otherwise the user may be waiting a while. If it fails, we should add some sort of sync point to wait for the container to be up before trying this.

Docker compose up will only succeed if the container healthcheck for web passes, so I don't think this will be a problem. That is performed on a previous line, so when the tests get to the createuser logic the web container will always be ready

BYK

Love the changes. BTW not sure how much time it costs us but maybe we can also use system-installed versions of jq and curl if we detect them and use the Docker-based ones as a fallback?

hubertdeng123 · 2024-03-07T22:05:33Z

BTW not sure how much time it costs us but maybe we can also use system-installed versions of jq and curl if we detect them and use the Docker-based ones as a fallback?

Good point, I can investigate that in a follow up

hubertdeng123 changed the title ~~Flakey Tests Fix?~~ Integration test improvements Mar 6, 2024

hubertdeng123 commented Mar 6, 2024

View reviewed changes

hubertdeng123 requested a review from azaslavsky March 6, 2024 00:34

integration test improvements

02ccedd

hubertdeng123 force-pushed the hubertdeng123/flakey-test-fix branch from 012451f to 02ccedd Compare March 6, 2024 00:36

hubertdeng123 marked this pull request as ready for review March 6, 2024 18:11

hubertdeng123 added 3 commits March 6, 2024 11:42

fix sentry send event

07c059e

install self-hsoted as separate step in gh action

364d0d2

do not report self-hosted issues during install

9dd9ac0

hubertdeng123 force-pushed the hubertdeng123/flakey-test-fix branch from 1719e5b to f67fe37 Compare March 6, 2024 23:14

run jobs for compose v2.0.1 and v2.7.0

c21576f

hubertdeng123 force-pushed the hubertdeng123/flakey-test-fix branch from f67fe37 to c21576f Compare March 6, 2024 23:17

azaslavsky reviewed Mar 6, 2024

View reviewed changes

incorporate sentry dsn for testing into existing logic

4308b2c

hubertdeng123 force-pushed the hubertdeng123/flakey-test-fix branch from d46632b to 4308b2c Compare March 6, 2024 23:56

hubertdeng123 added 2 commits March 6, 2024 16:08

fix test type

b22f1eb

remove retries for now

9c4fbfc

BYK approved these changes Mar 7, 2024

View reviewed changes

hubertdeng123 requested a review from azaslavsky March 7, 2024 18:41

comment onto using exec

ffabc9f

azaslavsky approved these changes Mar 7, 2024

View reviewed changes

fix error reporting

2ced8fb

hubertdeng123 force-pushed the hubertdeng123/flakey-test-fix branch from e2739da to 2ced8fb Compare March 7, 2024 22:04

hubertdeng123 force-pushed the hubertdeng123/flakey-test-fix branch from 86fd794 to b582adf Compare March 7, 2024 22:18

add error handling to test.sh

7559e55

hubertdeng123 force-pushed the hubertdeng123/flakey-test-fix branch from b582adf to 7559e55 Compare March 7, 2024 22:26

hubertdeng123 merged commit 746031d into master Mar 7, 2024
10 checks passed

hubertdeng123 deleted the hubertdeng123/flakey-test-fix branch March 7, 2024 22:49

hubertdeng123 mentioned this pull request Mar 22, 2024

Clean up flakey CI getsentry/team-ospo#242

Closed

6 tasks

github-actions bot locked and limited conversation to collaborators Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration test improvements #2858

Integration test improvements #2858

hubertdeng123 commented Mar 5, 2024 •

edited

Loading

BYK commented Mar 5, 2024

hubertdeng123 commented Mar 5, 2024

hubertdeng123 Mar 6, 2024

BYK Mar 7, 2024

hubertdeng123 commented Mar 6, 2024

azaslavsky Mar 6, 2024

hubertdeng123 Mar 6, 2024

azaslavsky Mar 6, 2024

hubertdeng123 Mar 6, 2024

azaslavsky Mar 7, 2024

hubertdeng123 Mar 7, 2024

azaslavsky Mar 7, 2024

azaslavsky Mar 6, 2024

hubertdeng123 Mar 6, 2024

azaslavsky Mar 6, 2024

hubertdeng123 Mar 6, 2024

BYK left a comment

hubertdeng123 commented Mar 7, 2024

		timeout 90 bash -c 'until $(curl -Isf -o /dev/null $SENTRY_TEST_HOST); do printf '.'; sleep 0.5; done'
		echo y \| $dc exec web sentry createuser --force-update --superuser --email $TEST_USER --password $TEST_PASS

Integration test improvements #2858

Integration test improvements #2858

Conversation

hubertdeng123 commented Mar 5, 2024 • edited Loading

BYK commented Mar 5, 2024

hubertdeng123 commented Mar 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hubertdeng123 commented Mar 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BYK left a comment

Choose a reason for hiding this comment

hubertdeng123 commented Mar 7, 2024

hubertdeng123 commented Mar 5, 2024 •

edited

Loading