Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bnb/dev #98

Merged
merged 10 commits into from
Oct 6, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
improved doc strings for some collection methods
  • Loading branch information
bnb32 committed Oct 5, 2022
commit 8b17dd8b3da956bca5238338a0105edac82acad4
40 changes: 25 additions & 15 deletions sup3r/postprocessing/collection.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,9 +326,7 @@ def _get_collection_attrs(cls, file_paths, feature, sort=True,
----------
file_paths : list | str
Explicit list of str file paths that will be sorted and collected
or a single string with unix-style /search/patt*ern.h5. Files
should have non-overlapping time_index dataset and fully
overlapping meta dataset.
or a single string with unix-style /search/patt*ern.h5.
feature : str
Dataset name to collect.
sort : bool
Expand All @@ -341,7 +339,8 @@ def _get_collection_attrs(cls, file_paths, feature, sort=True,
None will use all available workers.
target_final_meta_file : str
Path to target final meta containing coordinates to keep from the
full file list collected meta
full list of coordinates present in the collected meta for the full
file list.

Returns
-------
Expand Down Expand Up @@ -477,8 +476,11 @@ def _collect_flist(self, feature, masked_meta, time_index, shape,
feature : str
Dataset name to collect.
masked_meta : pd.DataFrame
Concatenated meta data for the given file paths. This masked
against the target_final_meta.
Meta data containing the list of coordinates present in both the
given file paths and the target_final_meta. This can be a subset of
the coordinates present in the full file list. The coordinates
contained in this dataframe have the same gids as those present in
the meta for the full file list.
time_index : pd.datetimeindex
Concatenated datetime index for the given file paths.
shape : tuple
Expand All @@ -488,12 +490,16 @@ def _collect_flist(self, feature, masked_meta, time_index, shape,
to be collected.
out_file : str
File path of final output file.
target_final_meta : str
Target final meta containing coordinates to keep from the
full file list collected meta
target_final_meta : pd.DataFrame
Meta data containing coordinates to keep from the full file list
collected meta. This can be but is not necessarily a subset of the
full list of coordinates for all files in the file list. This is
used to remove coordinates from the full file list which are not
present in the target_final_meta.
masked_target_meta : pd.DataFrame
Collected meta data with mask applied from target_final_meta so
original gids are preserved.
Dataframe containing the same coordinates contrained in the
target_final_meta but with the same gids that are present in the
full list of coordinates present in the full file list.
bnb32 marked this conversation as resolved.
Show resolved Hide resolved
max_workers : int | None
Number of workers to use in parallel. 1 runs serial,
None uses all available.
Expand Down Expand Up @@ -641,9 +647,7 @@ def collect(cls, file_paths, out_file, features, max_workers=None,
----------
file_paths : list | str
Explicit list of str file paths that will be sorted and collected
or a single string with unix-style /search/patt*ern.h5. Files
should have non-overlapping time_index dataset and fully
overlapping meta dataset.
or a single string with unix-style /search/patt*ern.h5.
out_file : str
File path of final output file.
features : list
Expand All @@ -666,7 +670,13 @@ def collect(cls, file_paths, out_file, features, max_workers=None,
a suffix format _{temporal_chunk_index}_{spatial_chunk_index}.h5
target_final_meta_file : str
Path to target final meta containing coordinates to keep from the
full file list collected meta
full file list collected meta. This can be but is not necessarily a
subset of the full list of coordinates for all files in the file
list. This is used to remove coordinates from the full file list
which are not present in the target_final_meta. Either this full
meta or a subset, depending on which coordinates are present in
the data to be collected, will be the final meta for the collected
output files.
n_writes : int | None
Number of writes to split full file list into. Must be less than
or equal to the number of temporal chunks.
Expand Down