Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Multi-GPU jobs don't terminate when one worker fails #590

Open
ant0nsc opened this issue Nov 17, 2021 · 0 comments
Open

Multi-GPU jobs don't terminate when one worker fails #590

ant0nsc opened this issue Nov 17, 2021 · 0 comments

Comments

@ant0nsc
Copy link
Contributor

ant0nsc commented Nov 17, 2021

I thought that this had been fixed in PL, but it seems not. This job in RadiomicsNN raises an exception in one of the child processes, but the main process does not terminate: HD_a45c4cbd-1b83-44b4-bdd9-b76baf5a3547_4

AB#4699

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant