Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data.mask: migrate to Dask #301

Conversation

sadielbartholomew
Copy link
Member

@sadielbartholomew sadielbartholomew commented Jan 25, 2022

Convert the mask managed attribute towards #182.

@sadielbartholomew sadielbartholomew added question General question dask Relating to the use of Dask labels Jan 25, 2022
@sadielbartholomew sadielbartholomew self-assigned this Jan 25, 2022
@davidhassell
Copy link
Collaborator

Hi Sadie, I'm afraid that I favour the existing behaviour, which I also see as being consistent with numpy. For a numpy array a, a.mask returns a boolean numpy array; similarly for a cf.Data object d, d.mask returns a boolean cf.Data object. I'm not sure what d.mask should return if not a cf.Data object. The cf.Data object mask is lazy (e.g. hasn't triggered a read from disk at definition time), and also supports extra features (such as cyclic axes) which are not supported elsewhere.

The following also works:

>>> import cf
>>> d = cf.example_field(0).data
>>> a = d.mask.array
>>> (a == d.mask).all()
True

i.e. we don't need the .array bit for everything to work through.

@sadielbartholomew
Copy link
Member Author

Hi @davidhassell, fair enough, thanks for explaining the context - I think in hindsight I was getting confused over the nature of mask and after your explanations I agree about the behaviour it should have. I will update this PR accordingly shortly.

Copy link
Collaborator

@davidhassell davidhassell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sadie, thanks - I've suggested a different tack on the main method, we can discuss this afternoon, perhaps.

cf/data/data.py Outdated Show resolved Hide resolved
cf/data/data.py Show resolved Hide resolved
sadielbartholomew and others added 4 commits February 9, 2022 13:27
Note: did not combine with 'test_Data_apply_masking' at this stage as
that isn't passing yet; separate test makes it easier to test the
migrated method.
@sadielbartholomew
Copy link
Member Author

All feedback has now been addressed. Triggering the CI jobs via open-close for a final sanity check...

@sadielbartholomew
Copy link
Member Author

All good: local test passes, CI jobs pass, & all feedback has been addressed, including adding a new unit test. Clearance to merge assuming similar conditions from David given externally prior to his taking leave, so merging now.

@sadielbartholomew sadielbartholomew merged commit 4001dd5 into NCAS-CMS:lama-to-dask Feb 17, 2022
@sadielbartholomew sadielbartholomew deleted the daskify-property-mask branch February 17, 2022 02:08
@davidhassell davidhassell added this to the 3.14.0 milestone Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dask Relating to the use of Dask question General question
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants