Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: add an async timer to kill ctest before the Buildkite job times out #3121

Merged
merged 7 commits into from
Mar 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 42 additions & 16 deletions .buildkite/capture_tmpdir.jl
Original file line number Diff line number Diff line change
Expand Up @@ -26,20 +26,8 @@ function my_exit(process::Base.Process)
exit(process.exitcode)
end

function get_bool_from_env(name::AbstractString, default_value::Bool)
value = get(ENV, name, "$(default_value)") |> strip |> lowercase
result = parse(Bool, value)::Bool
return result
end

const is_buildkite = get_bool_from_env("BUILDKITE", false)

function get_from_env(name::AbstractString)
if is_buildkite
value = ENV[name]
else
value = get(ENV, name, "")
end
value = ENV[name]
result = convert(String, strip(value))::String
return result
end
Expand All @@ -55,16 +43,34 @@ if length(ARGS) < 1
end

const build_number = get_from_env("BUILDKITE_BUILD_NUMBER") |> cleanup_string
const job_name = get_from_env("BUILDKITE_STEP_KEY") |> cleanup_string
const commit_full = get_from_env("BUILDKITE_COMMIT") |> cleanup_string
const commit_short = first(commit_full, 10)
const job_name = get_from_env("BUILDKITE_STEP_KEY") |> cleanup_string
const buildkite_timeout_minutes_string = get_from_env("BUILDKITE_TIMEOUT")

const commit_short = first(commit_full, 10)
const buildkite_timeout_minutes = parse(Int, buildkite_timeout_minutes_string)::Int
const cleanup_minutes = 15
const ctest_timeout_minutes = buildkite_timeout_minutes - cleanup_minutes
if ctest_timeout_minutes < 1
msg = "ctest_timeout_minutes must be strictly positive"
@error(
msg,
ctest_timeout_minutes,
buildkite_timeout_minutes,
cleanup_minutes,
)
throw(ErrorException(msg))
end

@info(
"",
build_number,
job_name,
commit_full,
commit_short,
ctest_timeout_minutes,
buildkite_timeout_minutes,
cleanup_minutes,
)

const my_archives_dir = joinpath(pwd(), "my_archives_dir")
Expand All @@ -88,6 +94,26 @@ mktempdir(my_temp_parent_dir) do dir
new_env["TMPDIR"] = TMPDIR
command = setenv(`$ARGS`, new_env)
global proc = run(command, (stdin, stdout, stderr); wait = false)

# Start asynchronous timer that will kill `ctest`
@async begin
sleep(ctest_timeout_minutes * 60)

# If we've exceeded the timeout and `ctest` is still running, kill it
if isopen(proc)
@error(
string(
"Process timed out ",
"(with a timeout of $(ctest_timeout_minutes) minutes). ",
"Killing with SIGTERM.",

)
)
kill(proc, Base.SIGTERM)
end
end

# Wait for `ctest` to finish, either through naturally finishing its run, or `SIGTERM`
wait(proc)

if proc.termsignal != 0
Expand Down Expand Up @@ -118,7 +144,7 @@ mktempdir(my_temp_parent_dir) do dir
end

buildkite_upload_cmd = `buildkite-agent artifact upload $(dst_file_name)`
if is_buildkite && !success(proc)
if !success(proc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still buildkite specific?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DilumAluthge Any reason this condition got removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so I was thinking that it seems unlikely that anyone would ever run the capture_tmpdir.jl script outside of CI, so I removed the "check if this is Buildkite" logic for simplicity.

That being said, if we anticipate that people might want to run this script locally, we should add that logic back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's fine. Just wanted to check that it was intentional.

run(setenv(buildkite_upload_cmd; dir = my_archives_dir))
end
end
Expand Down
8 changes: 4 additions & 4 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ agents:
sandbox.jl: "true"
os: "linux"
steps:
- label: "x86_64"
key: "x86_64"
- label: "Run-Tests-x86_64"
key: "Run-Tests-x86_64"
plugins:
- JuliaCI/julia#v1:
persist_depot_dirs: packages,artifacts,compiled
Expand All @@ -16,13 +16,13 @@ steps:
# Mount in the julia installation we used to create the sandbox in the first place
workspaces:
- "/cache:/cache"
timeout_in_minutes: 30
timeout_in_minutes: 45
commands: |
echo "--- Print kernel information"
uname -a
echo "--- Print CPU information"
# These machines have multiple cores. However, it should be sufficient to just print
# the information for one of the cores.
# the information for one of the cores.
sed -n '1,/^\$/p' /proc/cpuinfo
echo "--- Generate build environment"
cmake --version
Expand Down