Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New method pad_missing to support aggregation of DSGs #717

Closed
davidhassell opened this issue Feb 21, 2024 · 0 comments · Fixed by #718
Closed

New method pad_missing to support aggregation of DSGs #717

davidhassell opened this issue Feb 21, 2024 · 0 comments · Fixed by #718
Labels
enhancement New feature or request
Milestone

Comments

@davidhassell
Copy link
Collaborator

It is currently not possible to aggregate two DSG features of different lengths into a single DSG with a larger feature axis. E.g.

>>> a
<CF Field: precipitation_flux(cf_role=timeseries_id(2), ncdim%timeseries(9)) kg m-2 day-1>
>>> b
<CF Field: precipitation_flux(cf_role=timeseries_id(3), ncdim%timeseries(5)) kg m-2 day-1>
>>> cf.aggregate([a,b])
[<CF Field: precipitation_flux(cf_role=timeseries_id(2), ncdim%timeseries(9)) kg m-2 day-1>,
<CF Field: precipitation_flux(cf_role=timeseries_id(3), ncdim%timeseries(5)) kg m-2 day-1>

This is something we might want to do, because we can store DSGs of different lengths in one CF-netCDF data variable using a ragged array representation.

However, if we could pad out the ncdim%timeseries axis of b with missing data then we could do this with a new pad_missing method:

>>> # Pad out the 'ncdim%timeseries' axis with missing data:
>>> #   0 elements at the start of the axis and 4 elements at the end:
>>> b = b.pad_missing('ncdim%timeseries', (0, 4))
>>> c = cf.aggregate([a,b])  # Now this aggregates
>>> c
<CF Field: precipitation_flux(cf_role=timeseries_id(5), ncdim%timeseries(9)) kg m-2 day-1>]
>>> # Compress the field
>>> c = c[0].compress('contiguous')
>>> # Write it to disk in a single CF-netCDF data variable *without* the extra padding 
>>> cf.write(c, 'dsg.nc')

Numpy and Dask have a pad method that lets you do all sorts of fancy padding, but not for missing data. Their API is also more general. As a result, it may be better to call our method pad_missing to discern it from the more general pad method, and it would always be possible to implement a full cf-python pad in the future if ever the need arose.

PR to follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant