This notebook is based on existing and more elaborate NumPy lecture material:
For Matlab users, following reference material might com in handy:
Official documentation:
Name spaces: keep them separate, it will make your live easier on the long term.
In [1]:
import numpy as np
Note that some like to input everything from numpy as
from numpy import *
in order never to have to type np., but that might create problems (aka name space clashes) down the road. For instance, consider the following case:
In [2]:
from numpy import *
print type(cos(10))
cos(array([1,2,3]))
Out[2]:
In [3]:
from math import *
print type(cos(10))
In [4]:
cos(array([1,2,3]))
The function numpy.cos (which accepts numbers and arrays) clashed with math.cos (which only accepts numbers), and the last one that has been imported is the one that prevails.
Python objects:
NumPy provides:
In [6]:
np.array([[11,12,13], [21,22,23], [31,32,33]])
Out[6]:
Can't we just use Python lists instead?
In [5]:
L = range(5) # a Python list
a = np.arange(5) # a NumPy array
b = 2
Lists behave according to very simple yet clear rules:
In [ ]:
L1 = L*2
L2 = L+L
L1 == L2
In [ ]:
L1
We could apply element wise operations on list elements. But they will always be slower compared to NumPy:
In [ ]:
L = range(5000)
a = np.arange(5000)
%timeit [i**2 for i in L]
%timeit a**2
325/5
Convinience functions for creating special arrays:
In [6]:
np.ones((5,4))
Out[6]:
In [7]:
np.diagflat(np.array([1,2,3,4]))
Out[7]:
Selecting last element, slicing, views, copies, in place operations, linspace
In [11]:
a = np.arange(11)
a
a[-1]
a[2:9]
Out[11]:
When you print an array, NumPy displays it in a similar way to nested lists, but with the following layout:
Creating and printing arrays might work a little different than expected:
In [12]:
a1 = np.array([[111,121],[211,221]])
print a1
a2 = np.array([[112, 122],[212,222]])
print a2
What if we would create a 3D array, where a2 would be stacked along a third dimension:
In [15]:
a3 = np.dstack((a1, a2))
a3
Out[15]:
In [17]:
a4 = np.array([ [[111,112],[121,122]], [[211,212],[221,222]] ])
a5 = np.array([ [np.array([111,112]),[121,122]], [[211,212],[221,222]] ])
In [18]:
print a3 == a4
print a3[1,0,0]
In [7]:
plot(range10)
Create a 3D array with random data
In [19]:
d1, d2, d3 = 1000, 2000, 50
i0, i1, istep = 10, 700, 10
j0, j1, jstep = 8, 1000, 7
aa = np.random.rand(d1,d2,d3)
i = np.arange(i0,i1,istep)
j = np.arange(j0,j1,jstep)
k = [1,2, 10]
aa.shape
Out[19]:
Slicing that array using the intervals defined above in i,j,k
In [20]:
aaijk1 = aa[i,j,k]
So Matlab style slicing doesn't work. In NumPy there are 3 possible methods
In [21]:
# method 1: fancy indexing for each dimension separately
aai = aa[i,:,:]
aaij = aai[:,j,:]
aaijk = aaij[:,:,k]
# which is similar to
aaijk1 = aa[i,:,:][:,j,:][:,:,k]
# method 2: slicing on dimensions 1,2, and fancy indexing on the last dimension
# slicing: inclusive lower boundary, exclusive upper boundary
aaijk2 = aa[i0:i1:istep, j0:j1:jstep, k]
# method 3: fancy indexing
aaijk3 = aa[np.ix_(i,j,k)]
Verify the result, comparing two arrays with floats
In [22]:
print(np.allclose(aaijk1, aaijk))
print(np.allclose(aaijk2, aaijk))
print(np.allclose(aaijk3, aaijk))
Simple benchmarking with IPython magic %timeit, which method is faster?
In [23]:
# which method is faster?
print("method 1: fancy indexing for each dimension seperately")
%timeit aa[i,:,:][:,j,:][:,:,k]
print("method 2: slicing")
%timeit aa[i0:i1:istep, j0:j1:jstep, k]
print("method 3: fancy indexing")
%timeit aa[np.ix_(i,j,k)]
Slicing is the clear winner, because we create views and not copies of the arrays.
In [24]:
a = np.random.rand(10, 5)
b = np.ones( (10,2) )
c = np.zeros( (5,5) )
np.savetxt('savetxt', a)
np.savez('savez', a, b, c) # NumPy binary
np.save('save', a) # NumPy binary
And to load the saved arrays: use loadtxt, loads, load
In [ ]:
np.loadtxt('savetxt')
For more complex csv or text files, use numpy.genfromtxt(). For instance, we may be dealing with a fixed-width file, where columns are defined as a given number of characters. In that case, we need to set delimiter to a single integer (if all the columns have the same size) or to a sequence of integers (if columns can have different sizes).
In [26]:
fname = 'fixedwidth'
data1 = np.genfromtxt(fname, dtype=None, delimiter=',', comments='#', skiprows=0,
skip_header=0, skip_footer=0)
print data1.shape
data1[0]
Out[26]:
In [27]:
data2 = np.genfromtxt(fname, dtype=np.float32, delimiter=13, comments='#',
skiprows=0, skip_header=0, skip_footer=0)
print data2.shape
data2[:,0]
Out[27]:
In [ ]:
data = np.fromstring(string, dtype=float, count=-1, sep='')
data = np.fromfile(file, dtype=float, count=-1, sep='') # also for binary data
data.tofile(fname)
See also official SciPy tutorial on IO
Reading and writing Matlab .mat files
In [28]:
import scipy.io as sio
constitutive = sio.loadmat('constitutive.mat')
utils = sio.loadmat('utils.mat')
In [29]:
utils.keys()
Out[29]:
In [30]:
utils['utils'].dtype
Out[30]:
In [31]:
utils['utils']['foldername'][0][0]
Out[31]:
In [32]:
utils['utils']['matprops'][0][0]
Out[32]:
Or if you think that's too much of hasle, you could try to use the Octave to Python bridge
In [ ]:
from oct2py import octave
# short hand notation
oc = octave.run
oc("load('utils.mat')")
oc("load('constitutive.mat')")
utils = octave.get('utils')
constitutive = octave.get('constitutive')
In [ ]:
utils['matprops']
In [ ]: