Numpy: "number" + "python"

Numpy is a Python package that is commonly used by scientists. You can already do some math in Python by itself, but Numpy makes things even easier.

We've already seen that Python has data structures such as lists, tuples, and dictionaries; Numpy has arrays. Arrays are just matrices, which you might have seen in math classes. With Numpy, we can do math on entire matrices at once. To start with, here's what happens if we make a list of numbers and try to multiply each number in that list by 3:


In [1]:
a = [1,2,3]
b = 3*a
print b


[1, 2, 3, 1, 2, 3, 1, 2, 3]

If we want to multiply each number in that list by 3, we can certainly do it by looping through and multiplying each individual number by 3, but that seems like way too much work. Let's see what Numpy can do instead.

First, we need to import the Numpy package, which we commonly rename as "np."


In [2]:
import numpy as np

In [3]:
a = np.array([1,2,3])
b = 3*a
print b


[3 6 9]

Here, we created a numpy array from scratch. You can also convert a list or tuple into an array:


In [4]:
c = [2,5,8]
print c


[2, 5, 8]

In [5]:
c = np.array(c)
print c


[2 5 8]

There are a couple of ways to check if something is an array (as opposed to a list), but here's a really straight-forward way:


In [6]:
c


Out[6]:
array([2, 5, 8])

Let's say you want to know what the first element in the array is. You can select elements of arrays the same way you do for lists:


In [7]:
c[0]


Out[7]:
2

In [8]:
c[1]


Out[8]:
5

In [10]:
c[-2]


Out[10]:
5

You can perform slices in the same way as well:


In [11]:
c[0:2]


Out[11]:
array([2, 5])

All the elements of an array must be the same type. For instance, you can't have both integers and floats in a single array. If you already have an array of integers, and you try to put a float in there, Numpy will automatically convert it:


In [21]:
d = np.array([0.0,1,2,3,4],dtype=np.float32)
print d,type(d)
d.dtype


[ 0.  1.  2.  3.  4.] <type 'numpy.ndarray'>
Out[21]:
dtype('float32')

In [22]:
d[0] = 35.21
print d


[ 35.20999908   1.           2.           3.           4.        ]

Array arithmetic is always done element-wise. That is, the arithmetic is performed on each individual element, one at a time.


In [23]:
array1 = np.array((10, 20, 30))
array2 = np.array((1, 5, 10))
print array1 + array2


[11 25 40]

In [24]:
print array1 - array2


[ 9 15 20]

In [25]:
print 3*array1


[30 60 90]

In [26]:
print array1 * array2


[ 10 100 300]

What if you want an array, but you don't know what you want to put into it yet? If you know what size it needs to be, you can create the whole thing at once and make every element a 1 or a 0:


In [ ]:
ones_array = np.ones(5)
print ones_array

In [ ]:
print 5*ones_array

In [27]:
zeros_array = np.zeros(5, int)
print zeros_array


[0 0 0 0 0]

Notice that you can specify whether you want the numbers to be floats (the default) or integers. You can do complex numbers too.

If you don't know what size your array will be, you can create an empty one and append elements to it:


In [33]:
a=[1,2,3]
a.append(4)
a


Out[33]:
[1, 2, 3, 4]

In [34]:
f = np.array(())
print f
f = np.append(f, 3)
f = np.append(f, 5)
# f.append(5)
print f

# Question: what if you want that 3 to be an integer?


[]
[ 3.  5.]

In [ ]:
g = np.append(f, (2,1,0))
print g

Extra: Figure out how to insert or delete elements

If you want an array of numbers in chronological order, Numpy has a very handy function called "arange" that we saw on the first day.


In [35]:
print np.arange(10)


[0 1 2 3 4 5 6 7 8 9]

In [36]:
print np.arange(10.0)


[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]

In [37]:
print np.arange(1, 10)


[1 2 3 4 5 6 7 8 9]

In [38]:
print np.arange(1, 10, 2)


[1 3 5 7 9]

In [39]:
print np.arange(10, 1, -1)


[10  9  8  7  6  5  4  3  2]

There are some advantages of using np.arange() over range(); one of the most important ones is that np.arange() can take floats, not just ints.


In [40]:
print range(5, 10, 0.1)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-40-22391170f9b6> in <module>()
----> 1 print range(5, 10, 0.1)

TypeError: range() integer step argument expected, got float.

In [41]:
print np.arange(5, 10, 0.1)


[ 5.   5.1  5.2  5.3  5.4  5.5  5.6  5.7  5.8  5.9  6.   6.1  6.2  6.3  6.4
  6.5  6.6  6.7  6.8  6.9  7.   7.1  7.2  7.3  7.4  7.5  7.6  7.7  7.8  7.9
  8.   8.1  8.2  8.3  8.4  8.5  8.6  8.7  8.8  8.9  9.   9.1  9.2  9.3  9.4
  9.5  9.6  9.7  9.8  9.9]

Numpy has some functions that make statistical calculations easy:


In [42]:
redshifts =  np.array((0.2, 1.56, 6.3, 0.003, 0.9, 4.54, 1.1))

In [43]:
print redshifts.min(), redshifts.max()


0.003 6.3

In [44]:
print redshifts.sum(), redshifts.prod()


14.603 0.02650375728

In [45]:
print redshifts.mean(), redshifts.std()


2.08614285714 2.21460299109

In [46]:
print redshifts.argmax()
print redshifts[redshifts.argmax()]


2
6.3

We can use Numpy to select individual elements with certain properties. There are two ways to do this:


In [47]:
close = np.where(redshifts < 1)
print close
print redshifts[close]


(array([0, 3, 4]),)
[ 0.2    0.003  0.9  ]

In [48]:
middle = np.where( (redshifts>1) & (redshifts<2))
print redshifts[middle]


[ 1.56  1.1 ]

In [49]:
far = redshifts > 2
print far
print redshifts[far]


[False False  True False False  True False]
[ 6.3   4.54]

Numpy is a great way to read in data from ASCII files, like the data files you got from the 20 m robotic telescope. Let's start by creating a data file and then reading it in, both using Numpy.


In [51]:
saveThisData = np.random.rand(20,2)    # This prints out random numbers 
#                                         between 0 and 1 in the shape we tell it
print saveThisData
print saveThisData.shape


[[ 0.84144601  0.7645398 ]
 [ 0.87306055  0.42566518]
 [ 0.83700917  0.96262714]
 [ 0.59064435  0.3813825 ]
 [ 0.57445625  0.44642485]
 [ 0.72473348  0.25948261]
 [ 0.84260119  0.46230482]
 [ 0.75974566  0.96972777]
 [ 0.05999346  0.61083109]
 [ 0.56883506  0.71948526]
 [ 0.70304877  0.93399379]
 [ 0.68014293  0.88449072]
 [ 0.97984232  0.79670309]
 [ 0.82832295  0.08482301]
 [ 0.15451258  0.98817942]
 [ 0.37609868  0.92653347]
 [ 0.56785954  0.38391981]
 [ 0.69002105  0.56186978]
 [ 0.90604193  0.92424859]
 [ 0.47395247  0.91004998]]
(20, 2)

In [52]:
saveThisData = np.reshape(saveThisData, (2,20))   # If we don't like the shape we can reshape it
print saveThisData


[[ 0.84144601  0.7645398   0.87306055  0.42566518  0.83700917  0.96262714
   0.59064435  0.3813825   0.57445625  0.44642485  0.72473348  0.25948261
   0.84260119  0.46230482  0.75974566  0.96972777  0.05999346  0.61083109
   0.56883506  0.71948526]
 [ 0.70304877  0.93399379  0.68014293  0.88449072  0.97984232  0.79670309
   0.82832295  0.08482301  0.15451258  0.98817942  0.37609868  0.92653347
   0.56785954  0.38391981  0.69002105  0.56186978  0.90604193  0.92424859
   0.47395247  0.91004998]]

Save the array to a file:


In [53]:
np.savetxt('myData.txt', saveThisData)

Now we can use Numpy to read the file into an array:


In [55]:
readThisData = np.genfromtxt('myData.txt')
print readThisData


[[ 0.84144601  0.7645398   0.87306055  0.42566518  0.83700917  0.96262714
   0.59064435  0.3813825   0.57445625  0.44642485  0.72473348  0.25948261
   0.84260119  0.46230482  0.75974566  0.96972777  0.05999346  0.61083109
   0.56883506  0.71948526]
 [ 0.70304877  0.93399379  0.68014293  0.88449072  0.97984232  0.79670309
   0.82832295  0.08482301  0.15451258  0.98817942  0.37609868  0.92653347
   0.56785954  0.38391981  0.69002105  0.56186978  0.90604193  0.92424859
   0.47395247  0.91004998]]

In [56]:
print readThisData[0]


[ 0.84144601  0.7645398   0.87306055  0.42566518  0.83700917  0.96262714
  0.59064435  0.3813825   0.57445625  0.44642485  0.72473348  0.25948261
  0.84260119  0.46230482  0.75974566  0.96972777  0.05999346  0.61083109
  0.56883506  0.71948526]

In [57]:
print readThisData[:,0]


[ 0.84144601  0.70304877]

Matplotlib: "math" + "plotting" + "library" (I think)

Matplotlib is a Python plotting package which can do anything from simple plots to complicated ones that are used in scientific papers.

We need to import the matplotlib.pyplot library, which is generally written as "plt" in shorthand. The second line below, ("%matplotlib inline") tells ipython to make the plots in this notebook; otherwise, the plots will appear in new windows.


In [59]:
import matplotlib.pyplot as plt
%matplotlib inline

Matplotlib can create all the plots you need. For instance, we can plot our readThisData array from the Numpy session, which was two columns with 5 numbers each. Let's plot those numbers. It only takes a single line of code:


In [60]:
print readThisData


[[ 0.84144601  0.7645398   0.87306055  0.42566518  0.83700917  0.96262714
   0.59064435  0.3813825   0.57445625  0.44642485  0.72473348  0.25948261
   0.84260119  0.46230482  0.75974566  0.96972777  0.05999346  0.61083109
   0.56883506  0.71948526]
 [ 0.70304877  0.93399379  0.68014293  0.88449072  0.97984232  0.79670309
   0.82832295  0.08482301  0.15451258  0.98817942  0.37609868  0.92653347
   0.56785954  0.38391981  0.69002105  0.56186978  0.90604193  0.92424859
   0.47395247  0.91004998]]

In [63]:
plt.scatter(readThisData[0], readThisData[1])


Out[63]:
<matplotlib.collections.PathCollection at 0x7f0e6c52e390>

If we want lines to connect the points:


In [64]:
plt.plot(readThisData[0], readThisData[1])


Out[64]:
[<matplotlib.lines.Line2D at 0x7f0e6c4e3c10>]

In [65]:
plt.plot(readThisData[0], readThisData[1], 'o')  # another way to make a scatter plot


Out[65]:
[<matplotlib.lines.Line2D at 0x7f0e6c350d10>]

Here's another example of what Numpy and Matplotlib can do:


In [66]:
x = np.linspace(0, 2*np.pi, 50)
y = np.sin(x)
plt.plot(x, y)


Out[66]:
[<matplotlib.lines.Line2D at 0x7f0e6c27afd0>]

Let's add some labels and titles to our plot:


In [67]:
plt.plot(x, y, label='values')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.legend()
plt.title("trigonometry!")          # you can use double or single quotes


Out[67]:
<matplotlib.text.Text at 0x7f0e6c1fa9d0>

Let's change how this looks a little:


In [68]:
plt.plot(x, y, 'o-', color='red', markersize=6, linewidth=2, label='values')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.title("trigonometry!")


Out[68]:
<matplotlib.text.Text at 0x7f0e6c13be10>

Here's another kind of plot we can do:


In [69]:
plt.hist(readThisData[0], bins=20)


Out[69]:
(array([ 1.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  3.,  0.,  0.,  3.,  1.,
         0.,  2.,  2.,  0.,  4.,  0.,  2.]),
 array([ 0.05999346,  0.10548018,  0.15096689,  0.19645361,  0.24194032,
         0.28742704,  0.33291375,  0.37840047,  0.42388719,  0.4693739 ,
         0.51486062,  0.56034733,  0.60583405,  0.65132076,  0.69680748,
         0.7422942 ,  0.78778091,  0.83326763,  0.87875434,  0.92424106,
         0.96972777]),
 <a list of 20 Patch objects>)

What if we want logarithmic axes? The easiest way is to use semilogx(), semilogy(), or loglog():


In [70]:
x = np.linspace(-5, 5)
y = np.exp(-x**2)
plt.semilogy(x, y)


Out[70]:
[<matplotlib.lines.Line2D at 0x7f0e6c2b2150>]

When you're in the IPython Notebook, plots show up as soon as you make them. If, on the other hand, you're writing a separate script and running from the command line (by typing in "python script.py" where script.py is the name of the script you just wrote) OR typing everything into the command line, you'll need to explicitly tell python to plot it. The same three lines above would instead be:

x = np.linspace(-5,5)
y = np.exp(-x**2)
plt.semilogy(x, y)
plt.show()

There are a lot of other plots you can make with matplotlib. The best way to find out how to do something is to look at the gallery of examples: http://matplotlib.org/gallery.html