In [1]:
name = '2017-10-16-masked-arrays'
title = 'Masked arrays in NumPy'
tags = 'numpy'
author = 'Denis Sergeev'

In [2]:
from nb_tools import connect_notebook_to_post
from IPython.core.display import HTML

html = connect_notebook_to_post(name, title, tags, author)

A masked array includes:

  • mask of bad values travels with the array.

Those elements deemed bad are treated as if they did not exist. Operations using the array automatically use the mask of bad values.

Typically bad values may represent something like a land mask (i.e. sea surface temperature only exists where there is ocean).

All operations related to masked arrays live in numpy.ma submodule.


In [3]:
import numpy as np

In [4]:
x = np.array([1, 2, 3, 4, 5])
x


Out[4]:
array([1, 2, 3, 4, 5])

The simplest example of manual creation of a masked array:


In [5]:
mx = np.ma.masked_array(data=x,
                        mask=[True, False, False, True, False],
#                        fill_value=-999
                      )
mx


Out[5]:
masked_array(data = [-- 2 3 -- 5],
             mask = [ True False False  True False],
       fill_value = 999999)

We can check if an array contains any masked values


In [6]:
np.ma.is_masked(mx)


Out[6]:
True

or we can check if a particular element is masked


In [7]:
mx[1] is np.ma.masked


Out[7]:
False

The original data are not erased, they are still stored in the data attribute:


In [8]:
mx.data


Out[8]:
array([1, 2, 3, 4, 5])

The Mask

Can be accessed directly


In [9]:
mx.mask


Out[9]:
array([ True, False, False,  True, False], dtype=bool)

The masked entries can be filled with a given value to get an usual array back:


In [10]:
mx.filled()


Out[10]:
array([999999,      2,      3, 999999,      5])

The mask can also be cleared:


In [11]:
mx.mask = np.ma.nomask

In [12]:
mx.mask


Out[12]:
array([False, False, False, False, False], dtype=bool)

Domain-aware functions

Some functions handle masked values automatically, e.g. the log function.


In [13]:
np.log(mx)


Out[13]:
masked_array(data = [0.0 0.6931471805599453 1.0986122886681098 1.3862943611198906
 1.6094379124341003],
             mask = [False False False False False],
       fill_value = 999999)

In [14]:
np.ma.log(mx)


Out[14]:
masked_array(data = [0.0 0.6931471805599453 1.0986122886681098 1.3862943611198906
 1.6094379124341003],
             mask = [False False False False False],
       fill_value = 999999)

Note that result is the same.

Others don't see the mask, and so a relevant function from the ma submodule should be used instead (if it exists):


In [15]:
np.dot(mx, mx)


Out[15]:
55

In [16]:
np.ma.dot(mx, mx)


Out[16]:
masked_array(data = 55,
             mask = False,
       fill_value = 999999)

Array creation

Often, a task is to mask array depending on a criterion.


In [17]:
a = np.linspace(1, 15, 15)

In [18]:
a


Out[18]:
array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.,  13.,  14.,  15.])

In [19]:
masked_a = np.ma.masked_greater_equal(a, 11)

In [20]:
masked_a


Out[20]:
masked_array(data = [1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 -- -- -- -- --],
             mask = [False False False False False False False False False False  True  True
  True  True  True],
       fill_value = 1e+20)

Examples

Other simple examples can be found in the NumPy Docs: https://docs.scipy.org/doc/numpy-1.13.0/reference/maskedarray.generic.html#examples


In [21]:
HTML(html)


Out[21]:

This post was written as an IPython (Jupyter) notebook. You can view or download it using nbviewer.