NumPy arrays

Nikolay Koldunov

koldunovn@gmail.com

This is part of Python for Geosciences notes.

================

a powerful N-dimensional array object
sophisticated (broadcasting) functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number capabilities



In [1]:

    
#allow graphics inline
%matplotlib inline 
import matplotlib.pylab as plt #import plotting library
import numpy as np #import numpy library
np.set_printoptions(precision=3) # this is just to make the output look better

Load data

I am going to use some real data as an example of array manipulations. This will be the AO index downloaded by wget through a system call (you have to be on Linux of course):



In [1]:

    
!wget www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/monthly.ao.index.b50.current.ascii









    



--2017-05-23 09:45:14--  http://www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/monthly.ao.index.b50.current.ascii
Resolving www.cpc.ncep.noaa.gov... 140.90.101.63
Connecting to www.cpc.ncep.noaa.gov|140.90.101.63|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20200 (20K) [text/plain]
Saving to: ‘monthly.ao.index.b50.current.ascii’

monthly.ao.index.b5 100%[===================>]  19.73K  --.-KB/s    in 0s      

2017-05-23 09:45:15 (244 MB/s) - ‘monthly.ao.index.b50.current.ascii’ saved [20200/20200]

This is how data in the file look like (we again use system call for head command):



In [2]:

    
!head monthly.ao.index.b50.current.ascii









    



 1950    1  -0.60310E-01
 1950    2   0.62681E+00
 1950    3  -0.81275E-02
 1950    4   0.55510E+00
 1950    5   0.71577E-01
 1950    6   0.53857E+00
 1950    7  -0.80248E+00
 1950    8  -0.85101E+00
 1950    9   0.35797E+00
 1950   10  -0.37890E+00

Load data in to a variable:



In [3]:

    
ao = np.loadtxt('monthly.ao.index.b50.current.ascii')



In [4]:

    
ao









    Out[4]:





array([[  1.950e+03,   1.000e+00,  -6.031e-02],
       [  1.950e+03,   2.000e+00,   6.268e-01],
       [  1.950e+03,   3.000e+00,  -8.127e-03],
       ..., 
       [  2.017e+03,   2.000e+00,   3.399e-01],
       [  2.017e+03,   3.000e+00,   1.365e+00],
       [  2.017e+03,   4.000e+00,  -8.866e-02]])



In [5]:

    
ao.shape









    Out[5]:





(808, 3)

So it's a row-major order. Matlab and Fortran use column-major order for arrays.



In [8]:

    
type(ao)









    Out[8]:





numpy.ndarray

Numpy arrays are statically typed, which allow faster operations



In [6]:

    
ao.dtype









    Out[6]:





dtype('float64')

You can't assign value of different type to element of the numpy array:



In [10]:

    
ao[0,0] = 'Year'









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-5a47ddfa9232> in <module>()
----> 1 ao[0,0] = 'Year'

ValueError: could not convert string to float: Year

Slicing works similarly to Matlab:



In [11]:

    
ao[0:5,:]









    Out[11]:





array([[  1.950e+03,   1.000e+00,  -6.031e-02],
       [  1.950e+03,   2.000e+00,   6.268e-01],
       [  1.950e+03,   3.000e+00,  -8.127e-03],
       [  1.950e+03,   4.000e+00,   5.551e-01],
       [  1.950e+03,   5.000e+00,   7.158e-02]])

One can look at the data. This is done by matplotlib.pylab module that we have imported in the beggining as plt. We will plot only first 780 poins:



In [12]:

    
plt.plot(ao[:780,2])









    Out[12]:





[<matplotlib.lines.Line2D at 0x106173990>]

Index slicing

In general it is similar to Matlab

First 12 elements of second column (months). Remember that indexing starts with 0:



In [13]:

    
ao[0:12,1]









    Out[13]:





array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.])

First row:



In [14]:

    
ao[0,:]









    Out[14]:





array([  1.950e+03,   1.000e+00,  -6.031e-02])

We can create mask, selecting all rows where values in second row (months) equals 10 (October):



In [7]:

    
mask = (ao[:,1]==10)



In [8]:

    
mask.shape









    Out[8]:





(808,)

Here we apply this mask and show only first 5 rows of the array:



In [16]:

    
ao[mask][:5,:]









    Out[16]:





array([[  1.950e+03,   1.000e+01,  -3.789e-01],
       [  1.951e+03,   1.000e+01,  -2.129e-01],
       [  1.952e+03,   1.000e+01,  -4.372e-01],
       [  1.953e+03,   1.000e+01,  -1.945e-01],
       [  1.954e+03,   1.000e+01,   5.126e-01]])

You don't have to create separate variable for mask, but apply it directly. Here instead of first five rows I show five last rows:



In [17]:

    
ao[ao[:,1]==10][-5:,:]









    Out[17]:





array([[  2.010e+03,   1.000e+01,  -4.670e-01],
       [  2.011e+03,   1.000e+01,   7.997e-01],
       [  2.012e+03,   1.000e+01,  -1.514e+00],
       [  2.013e+03,   1.000e+01,   2.628e-01],
       [  2.014e+03,   1.000e+01,  -1.134e+00]])

You can combine conditions. In this case we select October-December data (only first 10 elements are shown):



In [18]:

    
ao[(ao[:,1]>=10)&(ao[:,1]<=12)][0:10,:]









    Out[18]:





array([[  1.950e+03,   1.000e+01,  -3.789e-01],
       [  1.950e+03,   1.100e+01,  -5.151e-01],
       [  1.950e+03,   1.200e+01,  -1.928e+00],
       [  1.951e+03,   1.000e+01,  -2.129e-01],
       [  1.951e+03,   1.100e+01,  -6.852e-02],
       [  1.951e+03,   1.200e+01,   1.987e+00],
       [  1.952e+03,   1.000e+01,  -4.372e-01],
       [  1.952e+03,   1.100e+01,  -1.891e+00],
       [  1.952e+03,   1.200e+01,  -1.827e+00],
       [  1.953e+03,   1.000e+01,  -1.945e-01]])

You can assighn values to subset of values (thi expression fixes the problem with very small value at 2015-04)



In [19]:

    
ao[ao<-10]=0

Basic operations

Create example array from first 12 values of second column and perform some basic operations:



In [20]:

    
months = ao[0:12,1]
months









    Out[20]:





array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.])



In [21]:

    
months+10









    Out[21]:





array([ 11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,
        22.])



In [22]:

    
months*20









    Out[22]:





array([  20.,   40.,   60.,   80.,  100.,  120.,  140.,  160.,  180.,
        200.,  220.,  240.])



In [23]:

    
months*months









    Out[23]:





array([   1.,    4.,    9.,   16.,   25.,   36.,   49.,   64.,   81.,
        100.,  121.,  144.])

Basic statistics

Create ao_values that will contain only data values:



In [24]:

    
ao_values = ao[:,2]

Simple statistics:



In [25]:

    
ao_values.min()









    Out[25]:





-4.2656999999999998



In [26]:

    
ao_values.max()









    Out[26]:





3.4952999999999999



In [27]:

    
ao_values.mean()









    Out[27]:





-0.11983340905732483



In [28]:

    
ao_values.std()









    Out[28]:





1.0069283815727728



In [29]:

    
ao_values.sum()









    Out[29]:





-94.069226109999988

You can also use np.sum function:



In [30]:

    
np.sum(ao_values)









    Out[30]:





-94.069226109999988

One can make operations on the subsets:



In [31]:

    
np.mean(ao[ao[:,1]==1,2]) # January monthly mean









    Out[31]:





-0.38995600000000002

Result will be the same if we use method on our selected data:



In [32]:

    
ao[ao[:,1]==1,2].mean()









    Out[32]:





-0.38995600000000002

Saving data

You can save your data as a text file



In [10]:

    
np.savetxt('ao_only_values.csv',ao[:, 2], fmt='%.4f')

Head of resulting file:



In [11]:

    
!head ao_only_values.csv

You can also save it as binary:



In [35]:

    
f=open('ao_only_values.bin', 'w')
ao[:,2].tofile(f)
f.close()



In [ ]: