NumPy arrays

Nikolay Koldunov

koldunovn@gmail.com

This is part of Python for Geosciences notes.

================

  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code
  • useful linear algebra, Fourier transform, and random number capabilities

In [1]:
#allow graphics inline
%matplotlib inline 
import matplotlib.pylab as plt #import plotting library
import numpy as np #import numpy library
np.set_printoptions(precision=3) # this is just to make the output look better

Load data

I am going to use some real data as an example of array manipulations. This will be the AO index downloaded by wget through a system call (you have to be on Linux of course):


In [1]:
!wget www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/monthly.ao.index.b50.current.ascii


--2017-05-23 09:45:14--  http://www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/monthly.ao.index.b50.current.ascii
Resolving www.cpc.ncep.noaa.gov... 140.90.101.63
Connecting to www.cpc.ncep.noaa.gov|140.90.101.63|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20200 (20K) [text/plain]
Saving to: ‘monthly.ao.index.b50.current.ascii’

monthly.ao.index.b5 100%[===================>]  19.73K  --.-KB/s    in 0s      

2017-05-23 09:45:15 (244 MB/s) - ‘monthly.ao.index.b50.current.ascii’ saved [20200/20200]

This is how data in the file look like (we again use system call for head command):


In [2]:
!head monthly.ao.index.b50.current.ascii


 1950    1  -0.60310E-01
 1950    2   0.62681E+00
 1950    3  -0.81275E-02
 1950    4   0.55510E+00
 1950    5   0.71577E-01
 1950    6   0.53857E+00
 1950    7  -0.80248E+00
 1950    8  -0.85101E+00
 1950    9   0.35797E+00
 1950   10  -0.37890E+00

Load data in to a variable:


In [3]:
ao = np.loadtxt('monthly.ao.index.b50.current.ascii')

In [4]:
ao


Out[4]:
array([[  1.950e+03,   1.000e+00,  -6.031e-02],
       [  1.950e+03,   2.000e+00,   6.268e-01],
       [  1.950e+03,   3.000e+00,  -8.127e-03],
       ..., 
       [  2.017e+03,   2.000e+00,   3.399e-01],
       [  2.017e+03,   3.000e+00,   1.365e+00],
       [  2.017e+03,   4.000e+00,  -8.866e-02]])

In [5]:
ao.shape


Out[5]:
(808, 3)

So it's a row-major order. Matlab and Fortran use column-major order for arrays.


In [8]:
type(ao)


Out[8]:
numpy.ndarray

Numpy arrays are statically typed, which allow faster operations


In [6]:
ao.dtype


Out[6]:
dtype('float64')

You can't assign value of different type to element of the numpy array:


In [10]:
ao[0,0] = 'Year'


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-5a47ddfa9232> in <module>()
----> 1 ao[0,0] = 'Year'

ValueError: could not convert string to float: Year

Slicing works similarly to Matlab:


In [11]:
ao[0:5,:]


Out[11]:
array([[  1.950e+03,   1.000e+00,  -6.031e-02],
       [  1.950e+03,   2.000e+00,   6.268e-01],
       [  1.950e+03,   3.000e+00,  -8.127e-03],
       [  1.950e+03,   4.000e+00,   5.551e-01],
       [  1.950e+03,   5.000e+00,   7.158e-02]])

One can look at the data. This is done by matplotlib.pylab module that we have imported in the beggining as plt. We will plot only first 780 poins:


In [12]:
plt.plot(ao[:780,2])


Out[12]:
[<matplotlib.lines.Line2D at 0x106173990>]

Index slicing

In general it is similar to Matlab

First 12 elements of second column (months). Remember that indexing starts with 0:


In [13]:
ao[0:12,1]


Out[13]:
array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.])

First row:


In [14]:
ao[0,:]


Out[14]:
array([  1.950e+03,   1.000e+00,  -6.031e-02])

We can create mask, selecting all rows where values in second row (months) equals 10 (October):


In [7]:
mask = (ao[:,1]==10)

In [8]:
mask.shape


Out[8]:
(808,)

Here we apply this mask and show only first 5 rows of the array:


In [16]:
ao[mask][:5,:]


Out[16]:
array([[  1.950e+03,   1.000e+01,  -3.789e-01],
       [  1.951e+03,   1.000e+01,  -2.129e-01],
       [  1.952e+03,   1.000e+01,  -4.372e-01],
       [  1.953e+03,   1.000e+01,  -1.945e-01],
       [  1.954e+03,   1.000e+01,   5.126e-01]])

You don't have to create separate variable for mask, but apply it directly. Here instead of first five rows I show five last rows:


In [17]:
ao[ao[:,1]==10][-5:,:]


Out[17]:
array([[  2.010e+03,   1.000e+01,  -4.670e-01],
       [  2.011e+03,   1.000e+01,   7.997e-01],
       [  2.012e+03,   1.000e+01,  -1.514e+00],
       [  2.013e+03,   1.000e+01,   2.628e-01],
       [  2.014e+03,   1.000e+01,  -1.134e+00]])

You can combine conditions. In this case we select October-December data (only first 10 elements are shown):


In [18]:
ao[(ao[:,1]>=10)&(ao[:,1]<=12)][0:10,:]


Out[18]:
array([[  1.950e+03,   1.000e+01,  -3.789e-01],
       [  1.950e+03,   1.100e+01,  -5.151e-01],
       [  1.950e+03,   1.200e+01,  -1.928e+00],
       [  1.951e+03,   1.000e+01,  -2.129e-01],
       [  1.951e+03,   1.100e+01,  -6.852e-02],
       [  1.951e+03,   1.200e+01,   1.987e+00],
       [  1.952e+03,   1.000e+01,  -4.372e-01],
       [  1.952e+03,   1.100e+01,  -1.891e+00],
       [  1.952e+03,   1.200e+01,  -1.827e+00],
       [  1.953e+03,   1.000e+01,  -1.945e-01]])

You can assighn values to subset of values (thi expression fixes the problem with very small value at 2015-04)


In [19]:
ao[ao<-10]=0

Basic operations

Create example array from first 12 values of second column and perform some basic operations:


In [20]:
months = ao[0:12,1]
months


Out[20]:
array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.])

In [21]:
months+10


Out[21]:
array([ 11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,
        22.])

In [22]:
months*20


Out[22]:
array([  20.,   40.,   60.,   80.,  100.,  120.,  140.,  160.,  180.,
        200.,  220.,  240.])

In [23]:
months*months


Out[23]:
array([   1.,    4.,    9.,   16.,   25.,   36.,   49.,   64.,   81.,
        100.,  121.,  144.])

Basic statistics

Create ao_values that will contain only data values:


In [24]:
ao_values = ao[:,2]

Simple statistics:


In [25]:
ao_values.min()


Out[25]:
-4.2656999999999998

In [26]:
ao_values.max()


Out[26]:
3.4952999999999999

In [27]:
ao_values.mean()


Out[27]:
-0.11983340905732483

In [28]:
ao_values.std()


Out[28]:
1.0069283815727728

In [29]:
ao_values.sum()


Out[29]:
-94.069226109999988

You can also use np.sum function:


In [30]:
np.sum(ao_values)


Out[30]:
-94.069226109999988

One can make operations on the subsets:


In [31]:
np.mean(ao[ao[:,1]==1,2]) # January monthly mean


Out[31]:
-0.38995600000000002

Result will be the same if we use method on our selected data:


In [32]:
ao[ao[:,1]==1,2].mean()


Out[32]:
-0.38995600000000002

Saving data

You can save your data as a text file


In [10]:
np.savetxt('ao_only_values.csv',ao[:, 2], fmt='%.4f')

Head of resulting file:


In [11]:
!head ao_only_values.csv


-0.0603
0.6268
-0.0081
0.5551
0.0716
0.5386
-0.8025
-0.8510
0.3580
-0.3789

You can also save it as binary:


In [35]:
f=open('ao_only_values.bin', 'w')
ao[:,2].tofile(f)
f.close()

In [ ]: