Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runner job hangs and eventually times out #2454

Closed
sergei-maertens opened this issue Feb 21, 2023 · 4 comments
Closed

Runner job hangs and eventually times out #2454

sergei-maertens opened this issue Feb 21, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@sergei-maertens
Copy link

Describe the bug

As of yesterday (possibly earlier due to no-work on the weekend) our Github workflow is hanging on a job until it timeouts by Github. 3 days ago this workflow was still working fine and nothing has changed in the workflow.

Hanging job (attempt #2): https://github.com/open-formulieren/open-forms/actions/runs/4204997312/jobs/7349219628
The same job, but passing 4 days ago (attempt #1): https://github.com/open-formulieren/open-forms/actions/runs/4204997312/attempts/1

To Reproduce
Steps to reproduce the behavior:

This seems to only occur on a specific job in our https://github.com/open-formulieren/open-forms repository on the coverage combine step.

  1. Based on master of https://github.com/open-formulieren/open-forms
  2. Make a PR or run the "Run CI" workflow
  3. See that the "Run the Django test suite" job hangs indefinitely
  4. When following along with the output, the result of the test suite is logged/visible in the log output/UI
  5. The next step (coverage combine) is not visible/logged anywhere and seems to hang
  6. "View raw logs" does not report anything about the coverage combine command and is only available after the job has timed out. During the job hanging, barely any output is available:
2023-02-21T09:22:04.3883435Z Requested labels: ubuntu-latest
2023-02-21T09:22:04.3883490Z Job defined at: open-formulieren/open-forms/.github/workflows/ci.yml@refs/heads/master
2023-02-21T09:22:04.3883517Z Waiting for a runner to pick up this job...
2023-02-21T09:22:07.1597154Z Job is waiting for a hosted runner to come online.
2023-02-21T09:22:09.4918092Z Job is about to start running on the hosted runner: Hosted Agent (hosted)

Expected behavior

The job completes without timing out.

Runner Version and Platform

Current runner version: '2.301.1'

OS of the machine running the runner?

Operating System
  Ubuntu
  22.0[4](https://github.com/open-formulieren/open-forms/actions/runs/4204997312/jobs/7349219628#step:1:4).1
  LTS

These are Github hosted runners.

What's not working?

Typically the "Run tests" job completes in < 10 minutes

image

Job Log Output

(to amend once the job has timed out)

Runner and Worker's Diagnostic Logs

Log archive of a different but similarly failing job (https://github.com/open-formulieren/open-forms/actions/runs/4225098387/jobs/7342276058)

logs_15682.zip

@sergei-maertens sergei-maertens added the bug Something isn't working label Feb 21, 2023
@sergei-maertens
Copy link
Author

I managed to further pin this down to the usage of coverage (python coverage measuring) - when completely disabling coverage, things work again. Running with/without concurrency options makes no difference, as evidenced by https://github.com/open-formulieren/open-forms/actions/runs/4231729491/jobs/7350606185

This could possibly be related to nedbat/coveragepy#1310

@sergei-maertens
Copy link
Author

I have created an issue with the coverage library (nedbat/coveragepy#1559) as it could be pinpointed to that, but it does seem like something changed on the infrastructure that is causing coverage not to properly exit so I'd prefer to keep both issues open to hopefully pinpoint the problem.

@sergei-maertens
Copy link
Author

Closing this as we're fairly certain it can be pinned to a regression from Python 3.10.9 to 3.10.10, see python/cpython#102126

@sergei-maertens sergei-maertens closed this as not planned Won't fix, can't repro, duplicate, stale Feb 22, 2023
@somaz94
Copy link

somaz94 commented May 30, 2023

The same problem happens to me.
The problem occurs when building a specific branch. Did you solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants