Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding open_groups to BackendEntryPointEngine, NetCDF4BackendEntrypoint, and H5netcdfBackendEntrypoint #9243

Merged
merged 33 commits into from
Aug 14, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
33ee4a9
sandbox open groups
eni-awowale Jul 13, 2024
8186d86
rough implementation of open_groups
eni-awowale Jul 13, 2024
6b63704
removed unused imports
eni-awowale Jul 13, 2024
ef01edc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 13, 2024
08e230a
oops deleted optional
eni-awowale Jul 13, 2024
e01b0fb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 13, 2024
9e1984c
commit to test from main
eni-awowale Jul 26, 2024
33a71ac
added tests and small sample file
eni-awowale Jul 30, 2024
b10fa10
merge main into open_groups
eni-awowale Jul 30, 2024
565ffb1
updated what is new
eni-awowale Jul 30, 2024
ce607e6
updated: open_groups to include fullpath of group, improved test to c…
eni-awowale Jul 30, 2024
eaba908
update float_ to float64 for numpy 2.0
eni-awowale Jul 30, 2024
b4b9822
added pr suggestions and mypy changes
eni-awowale Aug 2, 2024
9b9c1e7
merge conflict plugins.py
eni-awowale Aug 6, 2024
225489d
Merge branch 'main' into open_groups
eni-awowale Aug 6, 2024
5fe8e96
added mutable mapping
eni-awowale Aug 6, 2024
3b9c418
added mutable mapping to api
eni-awowale Aug 6, 2024
222279c
lets see if this passes mypy
eni-awowale Aug 7, 2024
4c65b0a
mypy take 2
eni-awowale Aug 7, 2024
8c81a87
mypy
eni-awowale Aug 7, 2024
d2c74d6
updated open_groups_dict
eni-awowale Aug 8, 2024
f206408
changed return type for DataTree.from_dict
eni-awowale Aug 8, 2024
5d34920
Merge branch 'main' into open_groups
eni-awowale Aug 8, 2024
f72f3d2
fix test failures
eni-awowale Aug 8, 2024
2f92b5c
update iter_nc_ to yield parent
eni-awowale Aug 8, 2024
175e287
Merge branch 'main' into open_groups
eni-awowale Aug 13, 2024
6319678
mypy suggestions
eni-awowale Aug 13, 2024
1466147
adding casting
eni-awowale Aug 13, 2024
0e3c946
explicitly convert to dict
eni-awowale Aug 13, 2024
1ffeae5
Merge branch 'main' into open_groups
dcherian Aug 14, 2024
abd4981
Merge branch 'main' into open_groups
eni-awowale Aug 14, 2024
d44bf98
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 14, 2024
b2cf9b4
updated to add d_cast for remaining functions
eni-awowale Aug 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
rough implementation of open_groups
  • Loading branch information
eni-awowale committed Jul 13, 2024
commit 8186d8692c6719658f41771f7ebd0c7229921b03
42 changes: 6 additions & 36 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -1758,39 +1758,9 @@ def open_groups(
open_datatree()
DataTree.from_dict()
"""
filename_or_obj = _normalize_path(filename_or_obj)
store = NetCDF4DataStore.open(
filename_or_obj,
group=group,
)
if group:
parent = NodePath("/") / NodePath(group)
else:
parent = NodePath("/")

manager = store._manager

ds = open_dataset(store, **kwargs)
# tree_root = DataTree.from_dict({str(parent): ds})
for path_group in _iter_nc_groups(store.ds, parent=parent):
group_store = NetCDF4DataStore(manager, group=path_group, **kwargs)
store_entrypoint = StoreBackendEntrypoint()
with close_on_error(group_store):
ds = store_entrypoint.open_dataset(
group_store,
mask_and_scale=mask_and_scale,
decode_times=decode_times,
concat_characters=concat_characters,
decode_coords=decode_coords,
drop_variables=drop_variables,
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
)
new_node: DataTree = DataTree(name=NodePath(path_group).name, data=ds)
tree_root._set_item(
path_group,
new_node,
allow_overwrite=False,
new_nodes_along_path=True,
)
return tree_root
if engine is None:
engine = plugins.guess_engine(filename_or_obj)

backend = plugins.get_backend(engine)

return backend.open_groups(filename_or_obj, **kwargs)
11 changes: 11 additions & 0 deletions xarray/backends/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -535,6 +535,17 @@ def open_datatree(

raise NotImplementedError()

def open_groups(
eni-awowale marked this conversation as resolved.
Show resolved Hide resolved
self,
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
**kwargs: Any,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should not make the same mistake as with open_dataset and prevent liskov errors.

Suggested change
**kwargs: Any,

If the abstract method supports any kwargs, so must all subclass implementations, which is not what we want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@headtr1ck from my understanding I think the **kwargs were added back to fix this issue #9135

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, Not sure.
But since this is the same problem in all other backend methods, I'm fine with leaving it as it is (and possibly change it in a future PR all together).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good we can revisit this on another PR.

) -> dict[str, Dataset]:
"""
Backend open_groups method used by Xarray in :py:func:`~xarray.open_groups`.
"""
eni-awowale marked this conversation as resolved.
Show resolved Hide resolved

raise NotImplementedError()


# mapping of engine name to (module name, BackendEntrypoint Class)
BACKEND_ENTRYPOINTS: dict[str, tuple[str | None, type[BackendEntrypoint]]] = {}
77 changes: 65 additions & 12 deletions xarray/backends/h5netcdf_.py
Original file line number Diff line number Diff line change
Expand Up @@ -441,9 +441,9 @@ def open_datatree(
group: str | Iterable[str] | Callable | None = None,
**kwargs,
) -> DataTree:
from xarray.backends.api import open_dataset
# TODO: Add function docstring

from xarray.backends.common import _iter_nc_groups
from xarray.core.datatree import DataTree
from xarray.core.treenode import NodePath
from xarray.core.utils import close_on_error

Expand All @@ -452,19 +452,20 @@ def open_datatree(
filename_or_obj,
group=group,
)
# Check for a group and make it a parent if it exists
if group:
parent = NodePath("/") / NodePath(group)
else:
parent = NodePath("/")

manager = store._manager
ds = open_dataset(store, **kwargs)
tree_root = DataTree.from_dict({str(parent): ds})

groups_dict = {}
eni-awowale marked this conversation as resolved.
Show resolved Hide resolved
for path_group in _iter_nc_groups(store.ds, parent=parent):
group_store = H5NetCDFStore(manager, group=path_group, **kwargs)
store_entrypoint = StoreBackendEntrypoint()
with close_on_error(group_store):
ds = store_entrypoint.open_dataset(
group_ds = store_entrypoint.open_dataset(
group_store,
mask_and_scale=mask_and_scale,
decode_times=decode_times,
Expand All @@ -474,14 +475,66 @@ def open_datatree(
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
)
new_node: DataTree = DataTree(name=NodePath(path_group).name, data=ds)
tree_root._set_item(
path_group,
new_node,
allow_overwrite=False,
new_nodes_along_path=True,

group_name = NodePath(path_group).name
groups_dict[group_name] = group_ds

return DataTree.from_dict(groups_dict)

def open_groups(
self,
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
*,
mask_and_scale=True,
decode_times=True,
concat_characters=True,
decode_coords=True,
drop_variables: str | Iterable[str] | None = None,
use_cftime=None,
decode_timedelta=None,
group: str | Iterable[str] | Callable | None = None,
**kwargs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be obsolete as well, when you remove it from the abstract method.

) -> dict[str, Dataset]:
# TODO: Add function docstring

from xarray.backends.common import _iter_nc_groups
from xarray.core.treenode import NodePath
from xarray.core.utils import close_on_error

filename_or_obj = _normalize_path(filename_or_obj)
store = H5NetCDFStore.open(
filename_or_obj,
group=group,
)
# Check for a group and make it a parent if it exists
if group:
parent = NodePath("/") / NodePath(group)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eni-awowale this is how you should join paths

else:
parent = NodePath("/")

manager = store._manager

groups_dict = {}
for path_group in _iter_nc_groups(store.ds, parent=parent):
group_store = H5NetCDFStore(manager, group=path_group, **kwargs)
store_entrypoint = StoreBackendEntrypoint()
with close_on_error(group_store):
group_ds = store_entrypoint.open_dataset(
group_store,
mask_and_scale=mask_and_scale,
decode_times=decode_times,
concat_characters=concat_characters,
decode_coords=decode_coords,
drop_variables=drop_variables,
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
)
return tree_root

group_name = NodePath(path_group).name
groups_dict[group_name] = group_ds

return groups_dict



BACKEND_ENTRYPOINTS["h5netcdf"] = ("h5netcdf", H5netcdfBackendEntrypoint)
77 changes: 66 additions & 11 deletions xarray/backends/netCDF4_.py
Original file line number Diff line number Diff line change
Expand Up @@ -682,7 +682,8 @@ def open_datatree(
group: str | Iterable[str] | Callable | None = None,
**kwargs,
) -> DataTree:
from xarray.backends.api import open_dataset
# TODO: Add function docstring

from xarray.backends.common import _iter_nc_groups
from xarray.core.datatree import DataTree
from xarray.core.treenode import NodePath
Expand All @@ -692,19 +693,21 @@ def open_datatree(
filename_or_obj,
group=group,
)

# Check for a group and make it a parent if it exists
if group:
parent = NodePath("/") / NodePath(group)
else:
parent = NodePath("/")

manager = store._manager
ds = open_dataset(store, **kwargs)
tree_root = DataTree.from_dict({str(parent): ds})
groups_dict = {}

for path_group in _iter_nc_groups(store.ds, parent=parent):
group_store = NetCDF4DataStore(manager, group=path_group, **kwargs)
store_entrypoint = StoreBackendEntrypoint()
with close_on_error(group_store):
ds = store_entrypoint.open_dataset(
group_ds = store_entrypoint.open_dataset(
group_store,
mask_and_scale=mask_and_scale,
decode_times=decode_times,
Expand All @@ -714,14 +717,66 @@ def open_datatree(
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
)
new_node: DataTree = DataTree(name=NodePath(path_group).name, data=ds)
tree_root._set_item(
path_group,
new_node,
allow_overwrite=False,
new_nodes_along_path=True,

group_name = NodePath(path_group).name
groups_dict[group_name] = group_ds

return DataTree.from_dict(groups_dict)


def open_groups(
self,
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
*,
mask_and_scale=True,
decode_times=True,
concat_characters=True,
decode_coords=True,
drop_variables: str | Iterable[str] | None = None,
use_cftime=None,
decode_timedelta=None,
group: str | Iterable[str] | Callable | None = None,
**kwargs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

) -> dict:
# TODO: Add function docstring

from xarray.backends.common import _iter_nc_groups
from xarray.core.treenode import NodePath

filename_or_obj = _normalize_path(filename_or_obj)
store = NetCDF4DataStore.open(
filename_or_obj,
group=group,
)

# Check for a group and make it a parent if it exists
if group:
parent = NodePath("/") / NodePath(group)
else:
parent = NodePath("/")

manager = store._manager

groups_dict = {}
for path_group in _iter_nc_groups(store.ds, parent=parent):
group_store = NetCDF4DataStore(manager, group=path_group, **kwargs)
store_entrypoint = StoreBackendEntrypoint()
with close_on_error(group_store):
group_ds = store_entrypoint.open_dataset(
group_store,
mask_and_scale=mask_and_scale,
decode_times=decode_times,
concat_characters=concat_characters,
decode_coords=decode_coords,
drop_variables=drop_variables,
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
)
return tree_root

group_name = NodePath(path_group).name
groups_dict[group_name] = group_ds

return groups_dict


BACKEND_ENTRYPOINTS["netcdf4"] = ("netCDF4", NetCDF4BackendEntrypoint)
1 change: 1 addition & 0 deletions xarray/backends/plugins.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ def get_backend(engine: str | type[BackendEntrypoint]) -> BackendEntrypoint:
if isinstance(engine, str):
engines = list_engines()
if engine not in engines:
# TODO: Improve error handing message
raise ValueError(
f"unrecognized engine {engine} must be one of: {list(engines)}"
)
Expand Down