Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(robot-server): Fix race condition in integration test #13707

Merged
merged 4 commits into from
Oct 5, 2023

Conversation

SyntaxColoring
Copy link
Contributor

@SyntaxColoring SyntaxColoring commented Oct 3, 2023

Overview

This fixes a test that was flaky because of a race condition.

We were waiting for a run to succeed by:

  1. Polling until its status indicated "done."
  2. Checking that the final status was succeeded.

But we would erroneously proceed from step 1 to step 2 when the run transitioned from running to finishing, and finishing is of course not succeeded, so the test would sometimes fail depending on timing.

Test Plan

Just make sure CI keeps passing.

Risk assessment

No risk.

@SyntaxColoring SyntaxColoring requested a review from a team October 3, 2023 16:40
@SyntaxColoring SyntaxColoring marked this pull request as ready for review October 3, 2023 16:40
@SyntaxColoring SyntaxColoring requested a review from a team as a code owner October 3, 2023 16:40
@CaseyBatten
Copy link
Contributor

Looks good to me, appropriate change to increase the scope of that check. Thinking on potential alternative cases, would it be beneficial to adjust the check to include "stop-requested" as a third valid pre-end state on this check? so this:

if latest_status not in {"running", "stop-requested", "finishing"}:
            return latest_status  # type: ignore[no-any-return]

@SyntaxColoring
Copy link
Contributor Author

SyntaxColoring commented Oct 5, 2023

Looks good to me, appropriate change to increase the scope of that check. Thinking on potential alternative cases, would it be beneficial to adjust the check to include "stop-requested" as a third valid pre-end state on this check? so this:

if latest_status not in {"running", "stop-requested", "finishing"}:
            return latest_status  # type: ignore[no-any-return]

That would be more appropriate, yes.

And there's a very similar polling loop in a different test file:

async def poll_until_run_succeeds(robot_client: RobotClient, run_id: str) -> Any:
"""Wait until a run completes, and then assert that it succeeded.
Return the completed run response.
"""
completed_run_statuses = {"stopped", "failed", "succeeded"}
while True:
run = (await robot_client.get_run(run_id=run_id)).json()
status = run["data"]["status"]
if status in completed_run_statuses:
assert status == "succeeded"
return run
else:
# The run is still ongoing. Wait a beat, then poll again.
await asyncio.sleep(RUN_POLL_INTERVAL)

That one takes the approach of listing out the "done" cases, instead of the "not done" cases.

This should be a shared function somewhere. I will do that in this PR.


Edit: Done.

@SyntaxColoring SyntaxColoring marked this pull request as draft October 5, 2023 19:46
@SyntaxColoring SyntaxColoring marked this pull request as ready for review October 5, 2023 20:32
@SyntaxColoring
Copy link
Contributor Author

Got a 👍 from @CaseyBatten in person.

@SyntaxColoring SyntaxColoring merged commit 6e19c20 into edge Oct 5, 2023
7 checks passed
@SyntaxColoring SyntaxColoring deleted the fix_poll_until_not_running branch October 5, 2023 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants