Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General data wrangling #57

Open
5 tasks done
pp-mo opened this issue Feb 2, 2024 · 1 comment
Open
5 tasks done

General data wrangling #57

pp-mo opened this issue Feb 2, 2024 · 1 comment

Comments

@pp-mo
Copy link
Owner

pp-mo commented Feb 2, 2024

Although strictly excluded as a goal for the initial release,
I still think the 'secondary' usage of ncdata will be useful :

  • for modifying data before loading, or after saving, with an analysis package
  • or just to adjust data and save to another file

For this there real scope for some convenience and sugar.
Some ideas :

  • ds.is_valid(error_when_not=False) : checking the consistencies not ensured by the free-and-easy design
    • ideas
      • all elements are filed under their own name (e.g. ds.variables['x'].name == 'x')
      • dims used by variables all exist
      • variables all have data
      • variable data shapes all match the dims
    • ( delivered : Save errors util #64 )
  • make it easy to add items by name : el.variables[var.name] = var --> el.variables.add(var)
  • make it easy to rename content, e.g. ds.variables.rename('x', 'y')
  • make it easy to construct containers (variables, attributes) from lists of element specifications
    e.g. NcData(dimensions=nc_dims(x=3, y=5, t=(2, True)), variables=nc_vars(x=(['x'], int), y=(['y'], int), data=(['t', 'y', 'x'], float))
    (or something !)
  • special convenience handling for attrs : e,g,
    el.ncd_setatt(name, value) ~= el.attributes[name] = NcAttribute(name, value)
    el.ncd_getatt(name) ~= el.attributes.get('name', NcAttribute('', None)).as_python_value()

Update:

v0.1.1 delivered most of this :


For instance, some actions I needed to adjust a given file output from xarray so that Iris can correctly interpret the coord-system ...

>>> ds = ncdata.netcdf4.from_nc4(filepath)
>>> ds.variables['x'].attributes['standard_name'] = NcAttribute('standard_name', 'projection_x_coordinate')
>>> ds.variables['y'].attributes['standard_name'] = NcAttribute('standard_name', 'projection_y_coordinate')
>>> ds.variables['x'].attributes['units'] = NcAttribute('units', 'm')
>>> ds.variables['y'].attributes['units'] = NcAttribute('units', 'm')
>>> del ds.variables['spatial_ref'].attributes['spatial_ref']
>>> del ds.variables['spatial_ref'].attributes['crs_wkt']
>>> del ds.variables['spatial_ref'].attributes['horizontal_datum_name'] 
>>> cube, = to_iris(ds)
>>> print(cube.coord_system)
<bound method Cube.coord_system of <iris 'Cube' of band_data / (unknown) (band: 5; projection_y_coordinate: 6400; projection_x_coordinate: 7600)>>
>>> print(cube.coord_system())
TransverseMercator(latitude_of_projection_origin=53.5, longitude_of_central_meridian=-8.0, false_easting=200000.0, false_northing=250000.0, scale_factor_at_central_meridian=1.000035, ellipsoid=GeogCS(semi_major_axis=6377340.189, semi_minor_axis=6356034.447938534))
>>> 

So, how about

ds.variables['x'].attributes.update(NameMap(
    NcAttribute,  # type of contents
    ('standard_name', 'projection_x_coordinate'),  # *args are init arglists
    (`units', 'm')
))
@pp-mo
Copy link
Owner Author

pp-mo commented Feb 3, 2024

We could also maybe be strict about expected content, to avoid obvious problems ...

  • element containers to enforce type of included objects (e.g. type(el.attributes['x']) == NcAttribute)
    • on container creation + insertion
  • container assignment to enforce container['x'].name == 'x'
    since, especially, it's far too easy to write el.attributes['x'] = val in mistake for '= NcAttribute('x', val)`

But, this approach involves plugging all loopholes for different means of putting things in a container,
such as 'extend', 'update', etc.
That is tricky to ensure if you provide a subclass of 'dict', since you need to be sure what list of operations needs to be modified. Meanwhile, it's easier to be sure of completeness if you subclass collections.MutableMapping (like iris CubeAttrsDict). But even then, the correctness + of the solution is not obvious -- and the result no longer satisfies isinstance(x, dict), and might need extra methods adding.

In any case, strictness + correctness is hard to maintain since the objects are designed for free use.
For example, el.attributes['x'] = attr = NcAttribute('x', val), but then you can just attr.name = 'y'

In that view, it makes sense to make it easy to do things 'right', preserving the expected.
By which logic, we should provide utilities such as :

  • rename an element within a container
    • preserving x[key].name == key)
  • sub-index a dimension (indexing all variables which depend on it)
    • preserving var.data.shape == (dims[dimname].size for dimname in var.dimensions)

conclusion :

  • at first anyway, don't bother to be "strict".
  • but do provide assistance to do common operations safely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant