added simple feature to allow for incremental progress if fwp chunks … #93

grantbuster · 2022-09-16T14:19:53Z

…fail

bnb32 · 2022-09-16T14:21:37Z

sup3r/pipeline/forward_pass.py

 """This routine runs forward passes on all spatiotemporal chunks for
 the given node index"""
 for chunk_index in strategy.node_chunks[node_index]:
- fwp = cls(strategy, chunk_index, node_index)
- fwp.run_chunk()
+ out_file = strategy.out_files[chunk_index]


Just thought about adding this this morning! Good call

Lets add this arg to fwp cli, just in case we want to overwrite easily

yeah thanks, i've been having a lot of random job failures, i think due to too many concurrent reads? but it's hard to tell whats going on. I think generally lustre is struggling with too much parallel io.... This should help but reducing our reliance on cache files will also be good.

Yeah I've had a few as well, since I moved my env to lustre (hmmm). Basically the only overlapping io is to the conda env right? There's no overlapping cache read/write between fwp calls (except for the time index files).

yeah i think in my case i had just had a few random job failures when writing cache files, possibly due to writing too many small cache files to a single OST in parallel. Removing the cache pattern input and clearing the cache dir fixed my jobs.

Sweet! whats OST btw?

object storage target (i think). when you stripe a directory it distributes to many OST (up to 30ish on eagle?). single OSTs can get overloaded by parallel io i think.

bnb32

Just the option to provide increment=False through the config. All good otherwise!

bnb32 · 2022-09-16T14:23:47Z

sup3r/pipeline/forward_pass.py

 """This routine runs forward passes on all spatiotemporal chunks for
 the given node index"""
 for chunk_index in strategy.node_chunks[node_index]:
- fwp = cls(strategy, chunk_index, node_index)
- fwp.run_chunk()
+ out_file = strategy.out_files[chunk_index]


Lets add this arg to fwp cli, just in case we want to overwrite easily

bnb32 · 2022-09-16T14:29:41Z

sup3r/pipeline/forward_pass.py

 """This routine runs forward passes on all spatiotemporal chunks for
 the given node index"""
 for chunk_index in strategy.node_chunks[node_index]:
- fwp = cls(strategy, chunk_index, node_index)
- fwp.run_chunk()
+ out_file = strategy.out_files[chunk_index]


Yeah I've had a few as well, since I moved my env to lustre (hmmm). Basically the only overlapping io is to the conda env right? There's no overlapping cache read/write between fwp calls (except for the time index files).

added simple feature to allow for incremental progress if fwp chunks …

added simple feature to allow for incremental progress if fwp chunks …

05439a7

…fail

grantbuster requested a review from bnb32 September 16, 2022 14:19

bnb32 reviewed Sep 16, 2022

View reviewed changes

bnb32 approved these changes Sep 16, 2022

View reviewed changes

moved incremental kwarg to strategy which will also expose it to the cli

9619951

grantbuster merged commit a65e8e0 into main Sep 19, 2022

grantbuster deleted the gb/fwp_partial_progress branch September 19, 2022 15:14

github-actions bot pushed a commit that referenced this pull request Sep 19, 2022

Merge pull request #93 from NREL/gb/fwp_partial_progress

5e78da2

added simple feature to allow for incremental progress if fwp chunks …

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added simple feature to allow for incremental progress if fwp chunks … #93

added simple feature to allow for incremental progress if fwp chunks … #93

grantbuster commented Sep 16, 2022

bnb32 Sep 16, 2022

bnb32 Sep 16, 2022

grantbuster Sep 16, 2022 •

edited

Loading

bnb32 Sep 16, 2022

grantbuster Sep 16, 2022

bnb32 Sep 16, 2022

grantbuster Sep 16, 2022

bnb32 Sep 16, 2022

bnb32 left a comment

bnb32 Sep 16, 2022

bnb32 Sep 16, 2022

added simple feature to allow for incremental progress if fwp chunks … #93

added simple feature to allow for incremental progress if fwp chunks … #93

Conversation

grantbuster commented Sep 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grantbuster Sep 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnb32 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grantbuster Sep 16, 2022 •

edited

Loading