Nikolay Koldunov
koldunovn@gmail.com
This is part of Python for Geosciences notes.
================
In [1]:
#allow graphics inline
%matplotlib inline
import matplotlib.pylab as plt #import plotting library
import numpy as np #import numpy library
np.set_printoptions(precision=3) # this is just to make the output look better
I am going to use some real data as an example of array manipulations. This will be the AO index downloaded by wget through a system call (you have to be on Linux of course):
In [1]:
!wget www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/monthly.ao.index.b50.current.ascii
This is how data in the file look like (we again use system call for head command):
In [2]:
!head monthly.ao.index.b50.current.ascii
Load data in to a variable:
In [3]:
ao = np.loadtxt('monthly.ao.index.b50.current.ascii')
In [4]:
ao
Out[4]:
In [5]:
ao.shape
Out[5]:
So it's a row-major order. Matlab and Fortran use column-major order for arrays.
In [8]:
type(ao)
Out[8]:
Numpy arrays are statically typed, which allow faster operations
In [6]:
ao.dtype
Out[6]:
You can't assign value of different type to element of the numpy array:
In [10]:
ao[0,0] = 'Year'
Slicing works similarly to Matlab:
In [11]:
ao[0:5,:]
Out[11]:
One can look at the data. This is done by matplotlib.pylab module that we have imported in the beggining as plt. We will plot only first 780 poins:
In [12]:
plt.plot(ao[:780,2])
Out[12]:
In general it is similar to Matlab
First 12 elements of second column (months). Remember that indexing starts with 0:
In [13]:
ao[0:12,1]
Out[13]:
First row:
In [14]:
ao[0,:]
Out[14]:
We can create mask, selecting all rows where values in second row (months) equals 10 (October):
In [7]:
mask = (ao[:,1]==10)
In [8]:
mask.shape
Out[8]:
Here we apply this mask and show only first 5 rows of the array:
In [16]:
ao[mask][:5,:]
Out[16]:
You don't have to create separate variable for mask, but apply it directly. Here instead of first five rows I show five last rows:
In [17]:
ao[ao[:,1]==10][-5:,:]
Out[17]:
You can combine conditions. In this case we select October-December data (only first 10 elements are shown):
In [18]:
ao[(ao[:,1]>=10)&(ao[:,1]<=12)][0:10,:]
Out[18]:
You can assighn values to subset of values (thi expression fixes the problem with very small value at 2015-04)
In [19]:
ao[ao<-10]=0
Create example array from first 12 values of second column and perform some basic operations:
In [20]:
months = ao[0:12,1]
months
Out[20]:
In [21]:
months+10
Out[21]:
In [22]:
months*20
Out[22]:
In [23]:
months*months
Out[23]:
Create ao_values that will contain only data values:
In [24]:
ao_values = ao[:,2]
Simple statistics:
In [25]:
ao_values.min()
Out[25]:
In [26]:
ao_values.max()
Out[26]:
In [27]:
ao_values.mean()
Out[27]:
In [28]:
ao_values.std()
Out[28]:
In [29]:
ao_values.sum()
Out[29]:
You can also use np.sum function:
In [30]:
np.sum(ao_values)
Out[30]:
One can make operations on the subsets:
In [31]:
np.mean(ao[ao[:,1]==1,2]) # January monthly mean
Out[31]:
Result will be the same if we use method on our selected data:
In [32]:
ao[ao[:,1]==1,2].mean()
Out[32]:
You can save your data as a text file
In [10]:
np.savetxt('ao_only_values.csv',ao[:, 2], fmt='%.4f')
Head of resulting file:
In [11]:
!head ao_only_values.csv
You can also save it as binary:
In [35]:
f=open('ao_only_values.bin', 'w')
ao[:,2].tofile(f)
f.close()
In [ ]: