Why/when does np.something remove the mask of a np.ma array ? #18675

jypeter · 2021-03-24T13:31:56Z

This is obviously more a feature than a bug, otherwise it would have been corrected (I'm using numpy 1.20.1). But it has been bothering me for a very long while, and been the indirect source of several bugs in my (and other colleagues') scripts

There must be some logic behind it, but I have not found it in the documentation. The closest issue I have found is #8881 (an open issue from 4 years ago)

I have a masked array. If I work on it with np.ma functions, things will be fine, but the equivalent function straight from np will silently ignore and remove the mask!

The example below is with hstack, but I get the same problem with vstack, repeat, and probably many other numpy functions

$ conda list | grep numpy
numpy                     1.20.1           py38h18fd61f_0    conda-forge
numpydoc                  1.1.0                      py_1    conda-forge

$ python
Python 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> a = np.ma.arange(4)
>>> a[2] = np.ma.masked

>>> a
masked_array(data=[0, 1, --, 3],
             mask=[False, False,  True, False],
       fill_value=999999)

>>> b_ma = np.ma.hstack((a, a))
>>> b_ma
masked_array(data=[0, 1, --, 3, 0, 1, --, 3],
             mask=[False, False,  True, False, False, False,  True, False],
       fill_value=999999)

>>> b_NOma = np.hstack((a, a))
>>> b_NOma
masked_array(data=[0, 1, 2, 3, 0, 1, 2, 3],
             mask=False,
       fill_value=999999)
>>>

On the other hand, some functions fortunately use and keep the mask, regardless of being taken from np or np.ma

>>> np.exp(b_ma)
masked_array(data=[1.0, 2.718281828459045, --, 20.085536923187668, 1.0,
                   2.718281828459045, --, 20.085536923187668],
             mask=[False, False,  True, False, False, False,  True, False],
       fill_value=999999)
>>> np.ma.exp(b_ma)
masked_array(data=[1.0, 2.718281828459045, --, 20.085536923187668, 1.0,
                   2.718281828459045, --, 20.085536923187668],
             mask=[False, False,  True, False, False, False,  True, False],
       fill_value=999999)

So, is this a bug or a feature? What is the logic (so that I can tell our students), and where is it clearly (for beginners) explained?

Even if this is not a bug, I think it would be much safer if numpy functions working on masked array would always use the mask and return a masked array

The text was updated successfully, but these errors were encountered:

seberg · 2021-03-24T15:52:37Z

This is obviously more a feature than a bug, otherwise it would have been corrected (I'm using numpy 1.20.1).

I would not consider it either a feature or a bug... But there is a fundamental problem and unfortunately there is no solution for "fixing" masked arrays. Masked arrays work well for many things, but the limitations are of course very clear to you. Maybe the point is that MaskedArrays are not considered part of the "core" of NumPy due to these problems, so the NumPy functions don't come with a guarantee to work with masked arrays.

There is currently consensus that these issues cannot be fixed in NumPy. The first thought might be to fix MaskedArray itself. But it would be a huge undertaking, that is bound to break backward compatibility. The other idea was adding proper "masked" support to the core of NumPy, but that is a huge project and has its own problem as well (this has been attempted a decade ago but ultimately failed for various reasons).

The good news is that we are in a pretty good position right now for a "better" MaskedArray! Something that was not possible when it was first written.
But such a project does not require implementation in NumPy itself. So the current consensus (even mentioned on the roadmap briefly) is that it should be a project started outside of NumPy proper. (I would be happy to host it on the NumPy organization if that helps pushing it.)

There are NumPy core-devs interested in such a project, for example @ahaldane. I am not sure how far they are along or if you are interested in contributing/looking at it.

As to what to do in NumPy more concretely and explaining the "state". Many NumPy functions will call np.asarray which effectively drops the mask, there is not always a simple logic to it, in some cases a function might just need to be changed from np.asarray to np.asanyarray and work.

One set of functions that always works I think, are the NumPy ufuncs/math functions isinstance(function, np.ufunc). But generally, you are only safe if you use the np.ma.* functions.

The only thing I could think of to actually improve the situation would be to tag on a warning when something calls np.asarray(masked_array), but even that might be tricky and probably not quite sufficient...

jypeter · 2021-03-30T09:16:49Z

Thank you for the clarification! I thought masked arrays were more tightly integrated in numpy than that. And we are probably all convinced that it is better to handle explicitly missing values than relying on nan stuff

Seen from outside (my user point of view), it seems that we are almost there. We should be working seamlessly with np and np.ma data, which is the case most of the time, but the side effects we get when masked data is suddenly cast back to non masked data can be both surprising and dangerous (if they go unnoticed)

As a regular user of np and np.ma, I can now spot errors when they seem to come from a lost mask, but it's not always easy

I hope something gets done about this at some point. In the meantime, mentioning something (without having it seem like a bug) in all the appropriate places in the documentation may help, if people read it. Or something that can be found with google.

Mentioning clearly masked arrays in NumPy user guide could help

Also, adding a warning when np.asarray or similar functions silently drop the mask would create a much safer world!

cooperrc · 2021-04-13T14:44:12Z

Thank you for the clarification! I thought masked arrays were more tightly integrated in numpy than that. And we are probably all convinced that it is better to handle explicitly missing values than relying on nan stuff

Seen from outside (my user point of view), it seems that we are almost there. We should be working seamlessly with np and np.ma data, which is the case most of the time, but the side effects we get when masked data is suddenly cast back to non masked data can be both surprising and dangerous (if they go unnoticed)

As a regular user of np and np.ma, I can now spot errors when they seem to come from a lost mask, but it's not always easy

I hope something gets done about this at some point. In the meantime, mentioning something (without having it seem like a bug) in all the appropriate places in the documentation may help, if people read it. Or something that can be found with google.

Mentioning clearly masked arrays in NumPy user guide could help

Also, adding a warning when np.asarray or similar functions silently drop the mask would create a much safer world!

@jypeter, for some suggested action items here, it would be great if you could share one of your use cases with the explanation included above. Your experience would help many new and seasoned NumPy users. Would you be willing to add either:

a "How to use masked arrays" in NumPy How-to's
a "Masked Array tutorial" in NumPy tutorials

A longer endeavor would be the suggested warnings when asarray ignores the mask. I think a quicker solution is to point new numpy.ma users to a "How-to" or "tutorial".

matteo-pallini · 2021-09-24T10:48:00Z

Thanks for the explanations.

From a user perspective I also think that it would be much safer to have a warning whenever np.asarray is called.

I ended up getting burned by using np.copy rather than np.ma.copy. It took me a bit to figure out where the issue was, I think that a warning would have sped up that process

rgommers · 2023-03-05T18:21:49Z

The only thing I could think of to actually improve the situation would be to tag on a warning when something calls np.asarray(masked_array), but even that might be tricky and probably not quite sufficient...

This is actually quite a footgun, so maybe we should just deprecate and then remove implicit conversion? When a user knows what they are doing, they can use the .data and .mask.

jypeter · 2023-03-07T16:21:25Z

If only users knew what they were doing and read the documentation!

We have to make sure that things work implicitly as expected (in our case, masks are used and carried on, if present), or that there are builtin safeguards (i.e. warning messages that will help the users get what they want). Things should be bulletproof (idiot-proof, for some of our lazy users...)

But I'm not the person who can do the required coding

As I said, masks (and masked arrays) are a great feature (I hate NaNs) when they work as expected

jypeter added the 04 - Documentation label Mar 24, 2021

jypeter changed the title ~~Why/when does np.something remove the mask of a np.ma.array ?~~ Why/when does np.something remove the mask of a np.ma array ? Mar 24, 2021

WarrenWeckesser added the component: numpy.ma masked arrays label Nov 18, 2021

greglucas linked a pull request Jan 2, 2023 that will close this issue

ENH: Add __array_function__ implementation to MaskedArray #22913

Open

rgommers mentioned this issue Mar 5, 2023

BUG: np.unique yields incorrect output for MaskedArray when axis is not None #23281

Closed

rgommers added this to the 1.25.0 release milestone Mar 5, 2023

charris modified the milestones: 1.25.0 release, 2.0.0 release May 24, 2023

seberg modified the milestones: 2.0.0 release, 2.1.0 release Jan 31, 2024

rgommers mentioned this issue Jun 12, 2024

np.asarray(masked_array) should raise rather than silently dropping the mask #26669

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why/when does np.something remove the mask of a np.ma array ? #18675

Why/when does np.something remove the mask of a np.ma array ? #18675

jypeter commented Mar 24, 2021

seberg commented Mar 24, 2021

jypeter commented Mar 30, 2021

cooperrc commented Apr 13, 2021

matteo-pallini commented Sep 24, 2021 •

edited

Loading

rgommers commented Mar 5, 2023

jypeter commented Mar 7, 2023

Why/when does np.something remove the mask of a np.ma array ? #18675

Why/when does np.something remove the mask of a np.ma array ? #18675

Comments

jypeter commented Mar 24, 2021

seberg commented Mar 24, 2021

jypeter commented Mar 30, 2021

cooperrc commented Apr 13, 2021

matteo-pallini commented Sep 24, 2021 • edited Loading

rgommers commented Mar 5, 2023

jypeter commented Mar 7, 2023

matteo-pallini commented Sep 24, 2021 •

edited

Loading