Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace dask 'compute()' usage with a common realisation call. (#2) #2447

Merged
merged 4 commits into from
Mar 21, 2017

Conversation

lbdreyer
Copy link
Member

Replaces #2422

import numpy as np
import numpy.ma as ma

from iris.analysis import VARIANCE
import iris.cube
from iris.coords import DimCoord
from iris._lazy_data import as_lazy_data, as_concrete_data
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need to agree on an order for importing hidden modules, atm we're inconsistent with it.
Unfortunately, a quick google didn't unearth any standard practice
I do like this approach of putting them in alphabetical, regardless of whether or not they are private though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to go with alphabetical (which we are) then _ sorts before all lower case letters, so IMHO private modules should go first.

If the data is lazy, return the realised result.

Where lazy data contains NaNs these are translated by filling or conversion
to masked data, using the :func:`convert_nans_array` function.
Copy link
Member Author

@lbdreyer lbdreyer Mar 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:func:convert_nans_array

Should this be

:func:`iris._lazy_data.convert_nans_array`

or is okay to be relative if it's in the same module?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could always make it

:func:`~iris._lazy_data.convert_nans_array`

Copy link
Member

@bjlittle bjlittle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to add some minimal test coverage

"""
Return the actual content of a lazy array, as a numpy array.

If the data is a NumPy array, return it unchanged.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer This also applies to ~numpy.ma.core.MaskedArray.

@@ -33,7 +33,8 @@
import numpy as np
import numpy.ma as ma

from iris._lazy_data import as_lazy_data, is_lazy_data, multidim_lazy_stack
from iris._lazy_data import (as_lazy_data, is_lazy_data, multidim_lazy_stack,
as_concrete_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer The sort order needs changes here ... make as_concrete_data first ...

import numpy as np
import numpy.ma as ma

from iris.analysis import VARIANCE
import iris.cube
from iris.coords import DimCoord
from iris._lazy_data import as_lazy_data, as_concrete_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to go with alphabetical (which we are) then _ sorts before all lower case letters, so IMHO private modules should go first.

If the data is lazy, return the realised result.

Where lazy data contains NaNs these are translated by filling or conversion
to masked data, using the :func:`convert_nans_array` function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could always make it

:func:`~iris._lazy_data.convert_nans_array`

@@ -39,6 +39,7 @@

import iris.aux_factory
import iris.exceptions
from iris._lazy_data import as_concrete_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import order ...

nans_replacement=np.ma.masked)
# NOTE: we probably don't have full support for masked aux-coords.
# We certainly *don't* handle a _FillValue attribute (and possibly
# the loader will throw one away ?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer We should raise a ticket to consider how we deal with masked integral data on coordinates and cell measures. At the moment we don't keep the result dtype ... this is lost in translation, which is bad.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer Did you create an issue to cover this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I'll do that now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjlittle I've just raised #2449
not sure if that's quite what you were after. Feel free to edit the issue

lib/iris/cube.py Outdated
@@ -43,7 +43,8 @@
import iris._constraints
from iris._deprecation import warn_deprecated
from iris._lazy_data import (array_masked_to_nans, as_lazy_data,
convert_nans_array, is_lazy_data)
convert_nans_array, is_lazy_data,
as_concrete_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import sort order ...

@@ -43,7 +43,8 @@
import iris.fileformats.rules
import iris.fileformats.pp_rules
import iris.coord_systems
from iris._lazy_data import array_masked_to_nans, as_lazy_data, is_lazy_data
from iris._lazy_data import (array_masked_to_nans, as_lazy_data, is_lazy_data,
as_concrete_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import sort order ...

@@ -23,7 +23,7 @@
# importing anything else.
import iris.tests as tests

from iris._lazy_data import as_lazy_data
from iris._lazy_data import as_lazy_data, as_concrete_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import sort order

@@ -26,7 +26,7 @@
import numpy as np
import dask.array as da

from iris._lazy_data import as_lazy_data, multidim_lazy_stack
from iris._lazy_data import as_lazy_data, multidim_lazy_stack, as_concrete_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import sort order

@lbdreyer
Copy link
Member Author

@bjlittle Hopefully you are happy with the changes?
(I am happy for you to squash and merge)

Where lazy data contains NaNs these are translated by filling or conversion
to masked data, using the :func:`~iris._lazy_data.convert_nans_array`
function.
See there for usage of the 'nans_replacement' and 'result_dtype' keys.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "there" --> "their".

Alternatively you could change the signature to

def as_concrete_data(data, **kwargs)

as the kwargs are not used at all by this function and are just passed straight to convert_nans_array. At that point you could replace this description of the kwargs with something like

"Kwargs are passed straight to :func:~iris._lazy_data.convert_nans_array."

It is a more typical way of doing such things in Python but also less readable...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "See there for usage" was meant as in "See over there for usage".

I like your idea of **kwargs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "See there for usage" was meant as in "See over there for usage".

Ah yes, on a re-reading you're quite right!

if (ma.isMaskedArray(merged_data) and
ma.count_masked(merged_data) == 0):
not ma.is_masked(merged_data)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice spot 😉

self.assertEqual(sentinel, result)

# Check call to convert_nans_array
conv_nans.assert_called_once()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon you can do assert_called_once_with, which may be useful here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkillick I don't think that assert_called_once_with will cope with comparing numpy arrays, so checking the call count and unpacking the call args seems okay ... but you could give @dkillick's suggestion a try.

@lbdreyer Although, you could combine this test and the above test_lazy_data test into one and also use sentinels for the nans_replacement and result_dtype ... that would make it sufficiently generic.

arg, = args
self.assertFalse(is_lazy_data(arg))
self.assertArrayEqual(arg, data)
self.assertEqual(kwargs, {'result_dtype': None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_called_once_with?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't work when one of the args is a numpy array

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably would if you also mocked the array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in this case, neither of them are NumPy arrays...


import numpy as np
import numpy.ma as ma
import dask.array as da
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer Sorry, but could you fix this import order ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah man! I didn't see this one!

@bjlittle
Copy link
Member

👍 Hoo-rah! Awesome, thanks @lbdreyer

@bjlittle bjlittle merged commit 4805e3b into SciTools:dask Mar 21, 2017
@lbdreyer
Copy link
Member Author

Thanks @bjlittle !!

bjlittle pushed a commit to bjlittle/iris that referenced this pull request May 31, 2017
…ciTools#2447)

Replace dask 'compute()' usage with a common realisation call.
@QuLogic QuLogic modified the milestones: dask, v2.0 Aug 2, 2017
@lbdreyer lbdreyer deleted the patrick_dask_concrete branch July 23, 2018 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants