Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Check health of services after running integration tests #1897

Merged
merged 16 commits into from
Jan 11, 2023

Conversation

ethanhs
Copy link
Contributor

@ethanhs ethanhs commented Jan 6, 2023

Fixes #1762

Copy link
Member

@chadwhitacre chadwhitacre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggested improvements.

@chadwhitacre chadwhitacre enabled auto-merge (squash) January 9, 2023 21:49
@chadwhitacre
Copy link
Member

Whoa! Is it ... working? 🤔

self-hosted-1-snuba-replacer-1                             "./docker_entrypoint…"   snuba-replacer                             restarting          

https://github.com/getsentry/self-hosted/actions/runs/3877996309/jobs/6613667226#step:4:6875

Is the replacer supposed to restart?

@chadwhitacre
Copy link
Member

Seeing snuba-replacer restarting again but not consistently. What fails on that second? 🤔

@ethanhs
Copy link
Contributor Author

ethanhs commented Jan 10, 2023

@chadwhitacre

Seeing snuba-replacer restarting again but not consistently. What fails on that second? 🤔

My guess is that it restarted but switched to a running state after we queried docker ps, so the second docker ps (the one we print) doesn't show the issue.

@BYK
Copy link
Member

BYK commented Jan 10, 2023

Not sure if this helps but Compose V2 has a --wait option that waits until all the containers are healthy: https://docs.docker.com/engine/reference/commandline/compose_up/

@chadwhitacre
Copy link
Member

My guess is that it restarted but switched to a running state after we queried docker ps, so the second docker ps (the one we print) doesn't show the issue.

Seems plausible. Should we capture the output of the first call so we can output exactly what we're testing against? Related: I did try to find a $dc ps --format that would reduce the columns of output for easier reading (we only really care about name and state/status), but afaict docker-compose ps --format does not support templates the way docker ps does (of course). Maybe dump that json and work with it twice?

@ethanhs
Copy link
Contributor Author

ethanhs commented Jan 10, 2023

Maybe dump that json and work with it twice?

Yeah I'm leaning towards that right now as well.

@ethanhs
Copy link
Contributor Author

ethanhs commented Jan 10, 2023

Of course, I'm still not sure why snuba-replacer is restarting...

For using --wait, I think since geoipupdate always exit we can't use that?

@ethanhs
Copy link
Contributor Author

ethanhs commented Jan 10, 2023

OK, now printing the JSON of the failing/exited services.

@chadwhitacre
Copy link
Member

I'm still not sure why snuba-replacer is restarting

Aye, there's the rub!

@ethanhs
Copy link
Contributor Author

ethanhs commented Jan 10, 2023

Found it! Batching was removed from replacer, added a fix for that.

@chadwhitacre
Copy link
Member

So does that mean self-hosted has been broken for three weeks? We should add some tests to catch this sort of thing. Oh wait ... ;-)

@chadwhitacre
Copy link
Member

I pushed 668ae2a1f867509556eee72f097e96dd6ec1cfce in order to evaluate the failure output formatting.

@chadwhitacre
Copy link
Member

Here's what I'm seeing:

{
  "Service": "snuba-replacer",
  "State": "restarting"
}

How about using the @tsv filter?

https://stackoverflow.com/questions/39139107/how-to-format-a-json-string-as-a-table-using-jq

@chadwhitacre chadwhitacre force-pushed the ethanhs/check-health-integration-tests branch from 668ae2a to f4c9932 Compare January 11, 2023 15:33
@chadwhitacre
Copy link
Member

(I force-pushed 668ae2a1f867509556eee72f097e96dd6ec1cfce off again.)

Copy link
Member

@chadwhitacre chadwhitacre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the @tsv filter for better readability?

@chadwhitacre
Copy link
Member

Heh, oops! With @tsv there is always output (at least the headers) so it always fails.

https://github.com/getsentry/self-hosted/actions/runs/3895765109/jobs/6651476439#step:4:6707

@chadwhitacre
Copy link
Member

Sob and actually broken run won't blend.

jq: error (at <stdin>:1): object ({"Service":...) cannot be tsv-formatted, only array

https://github.com/getsentry/self-hosted/actions/runs/3895778334/jobs/6651506996#step:4:2661

@chadwhitacre chadwhitacre force-pushed the ethanhs/check-health-integration-tests branch from 801a055 to 0f5b9cc Compare January 11, 2023 19:00
@chadwhitacre
Copy link
Member

Can you push a commit to show the error case?

@chadwhitacre
Copy link
Member

b9f949c is expected to pass CI (right?).

@ethanhs
Copy link
Contributor Author

ethanhs commented Jan 11, 2023

b9f949c is expected to pass CI (right?).

Yes

@ethanhs
Copy link
Contributor Author

ethanhs commented Jan 11, 2023

I will push something that breaks but I wanted to make sure the expected success case works first :)

@ethanhs
Copy link
Contributor Author

ethanhs commented Jan 11, 2023

64d51a2 adds back the snuba breakage to test things

@ethanhs
Copy link
Contributor Author

ethanhs commented Jan 11, 2023

@hubertdeng123
Copy link
Member

Can you add this comment to the PR description? For future reference so it's easier to see why it was deleted. Otherwise lgtm

@ethanhs ethanhs merged commit 6976bae into master Jan 11, 2023
@ethanhs ethanhs deleted the ethanhs/check-health-integration-tests branch January 11, 2023 22:51
@github-actions github-actions bot locked and limited conversation to collaborators Jan 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check health of containers in e2e tests
4 participants