We've already seen that Python has data structures such as lists, tuples, and dictionaries; Numpy has arrays. Arrays are just matrices, which you might have seen in math classes. With Numpy, we can do math on entire matrices at once. To start with, here's what happens if we make a list of numbers and try to multiply each number in that list by 3:
In [1]:
a = [1,2,3]
b = 3*a
print b
If we want to multiply each number in that list by 3, we can certainly do it by looping through and multiplying each individual number by 3, but that seems like way too much work. Let's see what Numpy can do instead.
First, we need to import the Numpy package, which we commonly rename as "np."
In [2]:
import numpy as np
In [3]:
a = np.array([1,2,3])
b = 3*a
print b
Here, we created a numpy array from scratch. You can also convert a list or tuple into an array:
In [4]:
c = [2,5,8]
print c
In [5]:
c = np.array(c)
print c
There are a couple of ways to check if something is an array (as opposed to a list), but here's a really straight-forward way:
In [6]:
c
Out[6]:
Let's say you want to know what the first element in the array is. You can select elements of arrays the same way you do for lists:
In [7]:
c[0]
Out[7]:
In [8]:
c[1]
Out[8]:
In [10]:
c[-2]
Out[10]:
You can perform slices in the same way as well:
In [11]:
c[0:2]
Out[11]:
All the elements of an array must be the same type. For instance, you can't have both integers and floats in a single array. If you already have an array of integers, and you try to put a float in there, Numpy will automatically convert it:
In [21]:
d = np.array([0.0,1,2,3,4],dtype=np.float32)
print d,type(d)
d.dtype
Out[21]:
In [22]:
d[0] = 35.21
print d
Array arithmetic is always done element-wise. That is, the arithmetic is performed on each individual element, one at a time.
In [23]:
array1 = np.array((10, 20, 30))
array2 = np.array((1, 5, 10))
print array1 + array2
In [24]:
print array1 - array2
In [25]:
print 3*array1
In [26]:
print array1 * array2
What if you want an array, but you don't know what you want to put into it yet? If you know what size it needs to be, you can create the whole thing at once and make every element a 1 or a 0:
In [ ]:
ones_array = np.ones(5)
print ones_array
In [ ]:
print 5*ones_array
In [27]:
zeros_array = np.zeros(5, int)
print zeros_array
Notice that you can specify whether you want the numbers to be floats (the default) or integers. You can do complex numbers too.
If you don't know what size your array will be, you can create an empty one and append elements to it:
In [33]:
a=[1,2,3]
a.append(4)
a
Out[33]:
In [34]:
f = np.array(())
print f
f = np.append(f, 3)
f = np.append(f, 5)
# f.append(5)
print f
# Question: what if you want that 3 to be an integer?
In [ ]:
g = np.append(f, (2,1,0))
print g
Extra: Figure out how to insert or delete elements
If you want an array of numbers in chronological order, Numpy has a very handy function called "arange" that we saw on the first day.
In [35]:
print np.arange(10)
In [36]:
print np.arange(10.0)
In [37]:
print np.arange(1, 10)
In [38]:
print np.arange(1, 10, 2)
In [39]:
print np.arange(10, 1, -1)
There are some advantages of using np.arange()
over range()
; one of the most important ones is that np.arange()
can take floats, not just ints.
In [40]:
print range(5, 10, 0.1)
In [41]:
print np.arange(5, 10, 0.1)
Numpy has some functions that make statistical calculations easy:
In [42]:
redshifts = np.array((0.2, 1.56, 6.3, 0.003, 0.9, 4.54, 1.1))
In [43]:
print redshifts.min(), redshifts.max()
In [44]:
print redshifts.sum(), redshifts.prod()
In [45]:
print redshifts.mean(), redshifts.std()
In [46]:
print redshifts.argmax()
print redshifts[redshifts.argmax()]
We can use Numpy to select individual elements with certain properties. There are two ways to do this:
In [47]:
close = np.where(redshifts < 1)
print close
print redshifts[close]
In [48]:
middle = np.where( (redshifts>1) & (redshifts<2))
print redshifts[middle]
In [49]:
far = redshifts > 2
print far
print redshifts[far]
Numpy is a great way to read in data from ASCII files, like the data files you got from the 20 m robotic telescope. Let's start by creating a data file and then reading it in, both using Numpy.
In [51]:
saveThisData = np.random.rand(20,2) # This prints out random numbers
# between 0 and 1 in the shape we tell it
print saveThisData
print saveThisData.shape
In [52]:
saveThisData = np.reshape(saveThisData, (2,20)) # If we don't like the shape we can reshape it
print saveThisData
Save the array to a file:
In [53]:
np.savetxt('myData.txt', saveThisData)
Now we can use Numpy to read the file into an array:
In [55]:
readThisData = np.genfromtxt('myData.txt')
print readThisData
In [56]:
print readThisData[0]
In [57]:
print readThisData[:,0]
We need to import the matplotlib.pyplot library, which is generally written as "plt" in shorthand. The second line below, ("%matplotlib inline
") tells ipython to make the plots in this notebook; otherwise, the plots will appear in new windows.
In [59]:
import matplotlib.pyplot as plt
%matplotlib inline
Matplotlib can create all the plots you need. For instance, we can plot our readThisData array from the Numpy session, which was two columns with 5 numbers each. Let's plot those numbers. It only takes a single line of code:
In [60]:
print readThisData
In [63]:
plt.scatter(readThisData[0], readThisData[1])
Out[63]:
If we want lines to connect the points:
In [64]:
plt.plot(readThisData[0], readThisData[1])
Out[64]:
In [65]:
plt.plot(readThisData[0], readThisData[1], 'o') # another way to make a scatter plot
Out[65]:
Here's another example of what Numpy and Matplotlib can do:
In [66]:
x = np.linspace(0, 2*np.pi, 50)
y = np.sin(x)
plt.plot(x, y)
Out[66]:
Let's add some labels and titles to our plot:
In [67]:
plt.plot(x, y, label='values')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.legend()
plt.title("trigonometry!") # you can use double or single quotes
Out[67]:
Let's change how this looks a little:
In [68]:
plt.plot(x, y, 'o-', color='red', markersize=6, linewidth=2, label='values')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.title("trigonometry!")
Out[68]:
Here's another kind of plot we can do:
In [69]:
plt.hist(readThisData[0], bins=20)
Out[69]:
What if we want logarithmic axes? The easiest way is to use semilogx()
, semilogy()
, or loglog()
:
In [70]:
x = np.linspace(-5, 5)
y = np.exp(-x**2)
plt.semilogy(x, y)
Out[70]:
When you're in the IPython Notebook, plots show up as soon as you make them. If, on the other hand, you're writing a separate script and running from the command line (by typing in "python script.py" where script.py is the name of the script you just wrote) OR typing everything into the command line, you'll need to explicitly tell python to plot it. The same three lines above would instead be:
x = np.linspace(-5,5)
y = np.exp(-x**2)
plt.semilogy(x, y)
plt.show()
There are a lot of other plots you can make with matplotlib. The best way to find out how to do something is to look at the gallery of examples: http://matplotlib.org/gallery.html