Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect netCDF4 locks to Xarray #5251

Open
pp-mo opened this issue Apr 17, 2023 · 1 comment
Open

Connect netCDF4 locks to Xarray #5251

pp-mo opened this issue Apr 17, 2023 · 1 comment

Comments

@pp-mo
Copy link
Member

pp-mo commented Apr 17, 2023

Iris and Xarray both use a mutex/lock to ensure thread-safety of all accesses to netCDF4, but the two schemes are somewhat different.

So, we should probably add some supported control scheme to Iris, to enable it to use the Xarray locks, or some combination.

Previously raised : Within the lazy netcdf saves #5191 , we noted that @fnattino had proposed to

"consider allowing a lock to be passed in by the user - as in this comment on #5031

Note on Priority :

For now I've put this on the Iris 3.6 project board
However, as long as I can demonstrate that the 'patching approach' mentioned below works, then this can happily be pushed back to Iris 3.7.

Background

For the Xarray Bridge #4994 proposal, we want to be able to work with combinations of lazy (netcdf file) data from both Iris and Xarray.
For example, combine arrays from iris- and xarray-loaded data, and save (with either library) with lazy streaming to a new output netcdf file.
This means we will need the two packages to co-operate = not interrupt one another while making netcdf library calls
= effectively, to respect a common lock.

Though we have previously successfully tested interoperation, I think this will break if we un-pin to allow libnetcdf>=1.6.2 (cf #5095, #5233). Since, after that version, the thread-safety issue of the netcdf library becomes a serious problem (though strictly, it was already a theoretical problem before that).

  • Both Iris + Xarray define a single global netCDF4-library lock which they use for all accesses.
  • Xarray associates a lock with each input dataset, which by default always connects to or incorporates its 'global netcdf lock'. But it can also be set to a user-provided lock.
    • Xarray however doesn't (currently) allow to associate a user-specified lock with an output netcdf file (generated by its to_netcdf calls) .
      However, I suspect that is probably just an oversight / missing feature, and can be fixed.
  • Iris currently treats its 'global netcdf lock' as a fixed feature, and a private object, and does not support providing a user-specified lock.
    • iris loading currently does not support extra format-specific keywords to "iris.load_xx" (unlike "iris.save_xx"). So that would need to be added for this -- unless we use an alternative content-manager approach ??
      Extra keys have been considered before but dropped

Patching approach

For now, given the current code in both Xarray and Iris, it should be possible to simply overwrite (patch) one or other global netcdf-library locks so they are simply equal.
E.G. iris.fileformats.netcdf._thread_safe_nc._GLOBAL_NETCDF4_LOCK = xarray.backends.netCDF4_.NETCDF4_PYTHON_LOCK
(or the other way around?)
(currently untested though)

As a solution, this is obviously a bit fragile !

@pp-mo
Copy link
Member Author

pp-mo commented Jul 3, 2023

I have now added a prototype solution for this in ncdata :
draft code here -- not yet merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

1 participant