Presented by Karen Cranston, uses some materials by Katy Huff and Matthew Terry.
The NumPy library includes (among other things) ways of storing and manipulating data that are more efficient than standard Python arrays. Using NumPy with numerical data is much faster than using Python lists or tuples. Goals here are to understand some of the gotchas when using arrays vs lists and to get a tour of the NumPy features.
We will start by importing the library and creating a regular Python list and a numpy array from that list.
In [1]:
import numpy
x = [1, 2, 3, 4, 5, 6 ]
np_arr = numpy.array(x)
Let's look at difference between x (python list) and arr (numpy array)
In [2]:
x
Out[2]:
In [3]:
np_arr
Out[3]:
In [4]:
np_arr.ndim
Out[4]:
In [5]:
np_arr.shape
Out[5]:
We can compare the two data structures. Operations on numpy arrays operate element by element. Explain this result?
In [6]:
x == np_arr
Out[6]:
Now, let's make a 2D array
In [7]:
x = [ [1, 2], [3, 4], [5, 6] ]
np_arr = numpy.array(x)
In [8]:
np_arr.shape
Out[8]:
We can slice the matrix to get the second column. Note that slices are a view of the same data. What happens when we change an element of the slice?
In [9]:
array_slice = np_arr[:,1]
array_slice
Out[9]:
In [10]:
array_slice[2]=7
In [11]:
np_arr
Out[11]:
Differences between shallow and deep copies
In [12]:
arr_copy = np_arr.copy()
arr_copy[0,0]=3
arr_copy
Out[12]:
In [13]:
np_arr
Out[13]:
Operating on Python lists and numpy arrays is very different.
In [14]:
x*2
Out[14]:
In [15]:
np_arr * 3
Out[15]:
With numpy arrays, operations are element by element. The multiplication operation multiplied each element individually. Compare to the Python list, where multiplication copied the entire array as a single unit. Try adding the list to iteself and compare to when you add the array to itself.
In [16]:
np_arr + np_arr
Out[16]:
Numpy has functions for all of your basic matrix operations and statistical functions.
T = transpose; dot = dot product
In [17]:
np_arr.T.dot(np_arr)
Out[17]:
In [20]:
numpy.average(np_arr)
Out[20]:
Average of what? (default is whole array flattened into single list). Find the average of the first column.
In [21]:
numpy.average(np_arr[:,0])
Out[21]:
In [30]:
numpy.cov(np_arr)
Out[30]:
We can use NumPy functions to read data from a file into an array
In [24]:
%%file example-data.txt
0,0
1,2
2,4
3,8
4,16
5,32
6,64
In [25]:
data = numpy.loadtxt('example-data.txt', delimiter=',')
print data
In [37]:
x = [ 0, 1, 2, 3, 4, 5, 6 ]
y = [ 0, 2, 4, 8, 16, 32, 64 ]
import matplotlib.pyplot as plt
plt.plot(x, y)
Out[37]:
In [38]:
plt.plot(x, y, 'r--', label='my favorite line')
plt.legend()
Out[38]:
In [40]:
plt.plot(x, y, 'r-')
plt.axis(xmin=-10, xmax = 8, ymin=-10)
Out[40]:
In [27]:
plot(x, y, 'r-')
axis(xmin=-10, xmax = 8, ymin=-10)
xlabel('This is my X axis')
ylabel('This is my Y axis')
title('foo')
savefig('/tmp/figure.pdf')
In [42]:
plt.plot(x, y, 'r-')
plt.axis(xmin=-10, xmax = 8, ymin=-10)
plt.xlabel('This is my X axis')
plt.ylabel('This is my Y axis')
plt.title('foo')
plt.savefig('/tmp/figure.png')
In [43]:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
bignum = 100
mat = np.random.random((bignum, bignum))
X, Y = np.mgrid[:bignum, :bignum]
fig = plt.figure()
ax = fig.add_subplot(1,1,1, projection='3d')
surf = ax.plot_surface(X,Y,mat)
plt.show()
In [ ]: