Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Workaround for bug in PL 1.5.5: CombinedLoader cannot be used with DDP for training data #646

Merged
merged 17 commits into from
Feb 1, 2022

Conversation

ant0nsc
Copy link
Contributor

@ant0nsc ant0nsc commented Jan 27, 2022

Workaround for PL bug Lightning-AI/pytorch-lightning#11632

Please follow the guidelines for PRs contained here. Checklist:

  • Ensure that your PR is small, and implements one change.
  • Add unit tests for all functions that you introduced or modified.
  • Run PyCharm's code cleanup tools on your Python files.
  • Link the correct GitHub issue for tracking.
  • Update the Changelog file: Describe your change in terms of
    Added/Changed/Removed/... in the "Upcoming" section.
  • When merging your PR, replace the default merge message with a description of your PR,
    and if needed a motivation why that change was required.

@ant0nsc ant0nsc linked an issue Jan 28, 2022 that may be closed by this pull request
@ant0nsc
Copy link
Contributor Author

ant0nsc commented Feb 1, 2022

Tests pass on local box. Test run on NIH_RSNA: master_1643282462778 in RadiomicsNN

@ant0nsc ant0nsc enabled auto-merge (squash) February 1, 2022 12:04
@ant0nsc ant0nsc linked an issue Feb 1, 2022 that may be closed by this pull request
"""
The train dataloaders
"""
return self.get_combined_loader(encoder_loader=self.encoder_module.train_dataloader(),
linear_head_loader=self.linear_head_module.train_dataloader())
return {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I would add also here a comment about the bug to explain why we return a dict instead of using the combined_datalaoder

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSL with min_size cycling gets stuck after epoch 0 SSL on multiple node triggers a bug in PL 1.5.5
3 participants