Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Fix for stuck test set inference for LightningContainer models #494

Merged
merged 5 commits into from
Jun 17, 2021

Conversation

ant0nsc
Copy link
Contributor

@ant0nsc ant0nsc commented Jun 17, 2021

This fixes in issue where in test set inference on multi-GPU jobs with LightningContainer models got stuck, attempting to
communicate with processes that are dead already.

Closes #493

Please follow the guidelines for PRs contained here. Checklist:

  • Ensure that your PR is small, and implements one change.
  • Add unit tests for all functions that you introduced or modified.
  • Run PyCharm's code cleanup tools on your Python files.
  • Link the correct GitHub issue for tracking.
  • Update the Changelog file: Describe your change in terms of
    Added/Changed/Removed/... in the "Upcoming" section.
  • When merging your PR, replace the default merge message with a description of your PR,
    and if needed a motivation why that change was required.

@ant0nsc ant0nsc enabled auto-merge (squash) June 17, 2021 15:02
@ant0nsc ant0nsc merged commit 9749954 into main Jun 17, 2021
@ant0nsc ant0nsc deleted the antonsc/inference_fix branch June 17, 2021 16:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-node training jobs for LightningContainer models can get stuck at inference time
3 participants