Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AppVeyor CI failures in starting workers #27274

Closed
Keno opened this issue May 27, 2018 · 6 comments
Closed

AppVeyor CI failures in starting workers #27274

Keno opened this issue May 27, 2018 · 6 comments
Labels
domain:ci Continuous integration

Comments

@Keno
Copy link
Member

Keno commented May 27, 2018

There's a recurring failure on AppVeyor where it fails to start a worker and then hangs for the rest of the session:

Timed out waiting to read host:port string from worker.
      From worker 13:	ERROR: Timed out waiting to read host:port string from worker.
      From worker 13:	error at .\error.jl:33 [inlined]
      From worker 13:	read_worker_host_port(::Pipe) at C:\projects\julia\usr\share\julia\stdlib\v0.7\Distributed\src\cluster.jl:291
      From worker 13:	connect(::Distributed.LocalManager, ::Int64, ::WorkerConfig) at C:\projects\julia\usr\share\julia\stdlib\v0.7\Distributed\src\managers.jl:397
      From worker 13:	create_worker(::Distributed.LocalManager, ::WorkerConfig) at C:\projects\julia\usr\share\julia\stdlib\v0.7\Distributed\src\cluster.jl:501
      From worker 13:	setup_launched_worker(::Distributed.LocalManager, ::WorkerConfig, ::Array{Int64,1}) at C:\projects\julia\usr\share\julia\stdlib\v0.7\Distributed\src\cluster.jl:447
      From worker 13:	(::getfield(Distributed, Symbol("##47#50")){Distributed.LocalManager,WorkerConfig})() at .\task.jl:254
      From worker 13:	Stacktrace:
      From worker 13:	 [1] sync_end(::Array{Any,1}) at .\task.jl:221
      From worker 13:	 [2] macro expansion at .\task.jl:240 [inlined]

E.g.: https://ci.appveyor.com/project/JuliaLang/julia/build/1.0.27132/job/c7n39lsfix6j2mi3

It would be nice to figure out what's going on here. Maybe some windows-specific race condition?

@kshyatt kshyatt added the domain:ci Continuous integration label May 27, 2018
@Keno
Copy link
Member Author

Keno commented May 27, 2018

We also have another failure where the tests are hanging and never complete, e.g. https://ci.appveyor.com/project/JuliaLang/julia/build/1.0.27135/job/707kib8lw7xwkje7

Keno added a commit that referenced this issue May 27, 2018
While investigating #27274 locally, I saw a hang during the cmdlineargs
test. Add a timeout to these tests to hopefully turn any hang into an
exception instead.
Keno added a commit that referenced this issue May 28, 2018
While investigating #27274 locally, I saw a hang during the cmdlineargs
test. Add a timeout to these tests to hopefully turn any hang into an
exception instead.
Keno added a commit that referenced this issue May 28, 2018
While investigating #27274 locally, I saw a hang during the cmdlineargs
test. Add a timeout to these tests to hopefully turn any hang into an
exception instead.
@Sacha0
Copy link
Member

Sacha0 commented May 29, 2018

Ref. another, somewhat different (?), AV timeout: #27249 (comment). Best!

@Keno
Copy link
Member Author

Keno commented May 29, 2018

That looks like a network error, leading to a hang in the OldPkg testsuite. Ideally we should make sure that all network errors result in reliable failures of the test suite rather than hangs.

@StefanKarpinski
Copy link
Sponsor Member

Should we just stop running the OldPkg tests for now since we're deprecating/deleting it in 0.7/1.0?

@ViralBShah
Copy link
Member

ViralBShah commented May 30, 2018

Yeah, it would be good to remove the OldPkg tests. That would save non-trivial CI time too.

@KristofferC
Copy link
Sponsor Member

Not using AV anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:ci Continuous integration
Projects
None yet
Development

No branches or pull requests

6 participants