Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

np.pad applied to the MaskedArray subclass silently unmasks the array, and returns output as ndarray #8881

Open
lukelbd opened this issue Mar 31, 2017 · 4 comments
Labels
component: numpy.ma masked arrays

Comments

@lukelbd
Copy link

lukelbd commented Mar 31, 2017

It seems to me this should be resolved so that the MaskedArray subclass is preserved and the mask attribute is likewise padded -- np.repeat and np.tile do this, for example. I suppose I see the complication -- for masking modes like constant, whether to mask these new values is ambiguous; perhaps in this case an explicit warning should be issued. But for methods that sample the edges of the existing MaskedArray, it seems to me it would make more sense to preserve the mask; perhaps this all requires a new np.ma.pad module.

Here is a simple example

In [1]: import numpy as np

In [2]: a = np.ma.MaskedArray([-999,-999,0,1,2],mask=[True,True,False,False,False],fill_value=999)

In [3]: a
Out[3]:
masked_array(data = [-- -- 0 1 2],
             mask = [ True  True False False False],
       fill_value = 999)

In [4]: np.pad(a,0,'wrap') # note a.filled() is NOT invoked; instead, object is simply unmasked
Out[4]: array([-999, -999,    0,    1,    2])

In [6]: np.pad(a,0,'edge') # same results as above; same is true for every padding 'mode'
Out[6]: array([-999, -999,    0,    1,    2])

In [8]: np.repeat(a,1) # expected behavior
Out[8]:
masked_array(data = [-- -- 0 1 2],
             mask = [ True  True False False False],
       fill_value = 999)

In [9]: np.tile(a,1) # expected behavior
Out[9]:
masked_array(data = [-- -- 0 1 2],
             mask = [ True  True False False False],
       fill_value = 999)

I noticed there are several other inconsistencies here -- for example, there are np.repeat and np.ma.repeat modules that do the same thing; there is only an np.tile method and no np.ma.tile method (but np.tile works as expected); and there is an np.concatenate method and np.ma.concatenate method where the former has unexpected behavior (sets mask=False, unmasks the data, and changes fill_value).

Perhaps I should start a more general thread on these inconsistency issues elsewhere?

@eric-wieser
Copy link
Member

eric-wieser commented Apr 3, 2017

This is generally the case for how np.<func> functions behave when called on ndarray subclasses. The underlying issue is the combination of the following:

  • np.func should not be expected to know about every subclass that can exist
  • subclasses have no way to override things which are neither ufuncs nor methods

As a result, many of the functions will take the approach of casting to ndarray, which discards any extra information added by the subclass.

perhaps this all requires a new np.ma.pad module.

Short term, this is the right fix

@lukelbd
Copy link
Author

lukelbd commented Apr 9, 2017

Ok; I guess I thought since MaskedArray was a built-in subclass it would be anticipated as a possible input. Would be super useful if np.ma could include wrappers around np.pad, np.tile, etc. that manipulates the mask in expected ways. Until then, I will just be using np.ndarray with careful handling of NaNs. Looks like a bit of a mess right now.

@lukelbd lukelbd closed this as completed Apr 9, 2017
@eric-wieser eric-wieser reopened this Apr 9, 2017
@eric-wieser
Copy link
Member

eric-wieser commented Apr 9, 2017

I think this issue is worth keeping - some functions within np. are subclass-aware, which they achieve by calling __array_prepare__ and __array_wrap__. This is sort of a hack, but it's also something that happens.

So it would almost certainly be possible to pull this hack into pad to match its existance elsewhere.

Right now, I think the biggest problem is that np.concatenate is not subclass-aware/overridable, and therefore any function that needs concatenate ends up also failing on subclasses

@mattip
Copy link
Member

mattip commented Oct 3, 2019

numpy.pad can now be intercepted with __array_function__.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: numpy.ma masked arrays
Projects
None yet
Development

No branches or pull requests

3 participants