While the Python language is an excellent tool for general-purpose programming, it was not designed specifically for mathematical and scientific computing. Numpy allows for:
The critical thing to know is that Python for loops are very slow! One should try to use array-operations as much as possible.
First, let's import the numpy module. There are multiple ways we can do this. In the following examples, zeros
is a numpy routine which we will see later, and depending on how we import numpy we call it in different ways:
'import numpy'
imports the entire numpy module --> 'numpy.zeros()'
'import numpy as np'
imports the entire numpy module and renames it --> 'np.zeros()'
'from numpy import *'
imports the entire numpy module (more or less) --> 'zeros()'
'from numpy import zeros'
imports only the zeros()
routineAfter all that preamble, let's get started:
In [1]:
import numpy as np
You can create an array from scratch, similar to how you might create a list / tuple / whatever:
In [3]:
a = np.array((1,2,3)) # or np.array((1,2,3))
print a
You can also convert a list or tuple of elements into an array:
In [4]:
myList = [0.2, 0.4, 0.6]
myArray = np.array(myList)
print myList
print myArray
In [5]:
print type(myArray)
In [6]:
print myArray.dtype
Arrays can be created with non-numbers as well, but all the elements of an array have to have the same type. i.e., arrays are homogeneous. Once an array has been created, its dtype is fixed and it can only store elements of the same type. However, the dtype can explicitly be changed (we'll see this later).
In [7]:
# an array of booleans
print np.array([True, False, True])
In [8]:
# an array of characters/strings
print np.array(['a', 'b', 'c'])
In [9]:
print np.array([2, 3, 'c'])
In [10]:
print np.array([2, 3, 0.4])
In [12]:
print np.array([2, 3, 'C'], dtype=int)
You can access elements of an array in the same way you access elements of a list:
In [17]:
print myArray
In [19]:
print myArray[-2]
In [23]:
print myArray[1]
print myArray[1:2]
Multidimensional arrays work like you might expect:
In [26]:
newArray = np.array([ [3, 8, 0, 1],
[4, 0, 0, 9],
[2, 2, 7, 1],
[5, 1, 0, 8]] )
print newArray
print newArray.shape
In [34]:
print newArray[:,-1]
In [35]:
print newArray[1,3]
In [36]:
print newArray[3,0:2]
In [37]:
print newArray[:,0] # print out an individual column
If you know the size of the array you want, you can create an array of ones or zeros or an empty array:
In [39]:
b = np.ones(3)
print b
print b.shape
In [40]:
c = np.zeros((1,3), int)
print c
print type(c)
print c.dtype
In [42]:
print [3, 5, 6]
print (3, 5, 6)
In [45]:
a = [3,5,6]
b = (3,5,6)
b[0] = 1
print b
In [41]:
d = np.zeros(3, complex)
print d
print d.dtype
In [47]:
# slightly faster, but be careful to fill it with actual values!
f = np.empty(4)
f.fill(3.14)
f[-1] = 0
print f
Create an identity array:
In [48]:
print np.eye(5, dtype=int) # default data type is 'float'
In [49]:
print np.arange(-5, 5, 0.5) # excludes upper endpoint
In [52]:
print np.linspace(-3, 3, 21) # includes both endpoints
In [51]:
print np.logspace(1, 4, 4)
Some examples of random number generation using numpy
In [53]:
print np.random.rand(10)
print np.random.rand(2,2)
In [55]:
print np.()
In [56]:
print np.random.randint(2,100,5)
In [57]:
print np.random.normal(10, 3, (2,4))
In [58]:
print np.random.randn(5) # samples 5 times from the standard normal distribution
In [59]:
print np.random.normal(3, 1, 5) # samples 5 times from a Gaussian with mean 3 and std dev 1
In [60]:
print newArray
In [63]:
print newArray.reshape(2,7)
In [62]:
print newArray.reshape(-1,2)
None of these manipulations have modified the original newArray:
In [64]:
print newArray
In [65]:
reshapedArray = newArray.reshape(2,8)
print reshapedArray
print newArray
Above we saw how to index arrays with single numbers and slices, just like Python lists. But arrays allow for a more sophisticated kind of indexing which is very powerful: you can index an array with another array, and in particular with an array of boolean values. This is particluarly useful to extract information from an array that matches a certain condition.
In [66]:
redshifts = np.array((0.2, 1.56, 6.3, 0.003, 0.9, 4.54, 1.1))
In [67]:
print redshifts
In [68]:
close = redshifts < 1
print close
print redshifts[close]
In [72]:
far = np.where(redshifts > 2)
print far[0][0]
print redshifts[far]
In [73]:
middle = np.where( (redshifts >= 1) & (redshifts <= 2) )
print middle
print redshifts[middle]
In [75]:
print (redshifts >= 1) & (redshifts <= 2)
In [76]:
myList = [3, 6, 7, 2]
print 2*myList
In [77]:
myArray = np.array([3, 6, 7, 2])
print 2*myArray
In general, mathematical operations on arrays are done element-by-element:
In [78]:
arr1 = np.arange(4)
arr2 = np.arange(10,14)
print arr1
print arr2
In [79]:
print arr1 + arr2
print arr1 - arr2
print arr1 * arr2
print arr1 / arr2
Notice in particular that multiplication is element-wise and is NOT a dot product or regular matrix multiplication.
In [80]:
print 3.5 + arr1
In the last example, numpy understood the command to be "add 3.5 to every element in arr1." That is, it converts the scalar 3.5 into an array of the appropriate size. Since the new array is filled with floats, and arr1 is filled with ints, the summed array is an array of floats.
The broadcasting rules allow numpy to:
So in the above example, the scalar 1.5 is effectively:
arr1
.After these two operations are complete, the addition can proceed as now both operands are one-dimensional arrays of length 4.
This broadcasting behavior is in practice enormously powerful, especially because when numpy broadcasts to create new dimensions or to 'stretch' existing ones, it doesn't actually replicate the data. In the example above the operation is carried as if the 1.5 was a 1-d array with 1.5 in all of its entries, but no actual array was ever created. This can save lots of memory in cases when the arrays in question are large and can have significant performance implications.
In [81]:
arr1 += 10
print arr1
In [82]:
arr1.fill(0)
print arr1
In [83]:
print arr2
print np.mean(arr2), arr2.mean()
print np.sum(arr2), arr2.sum()
print np.min(arr2), arr2.min()
Numpy makes it easy to write arrays to files and read them. It can write both text and binary files. In a text file, the number $\pi$ could be written as "3.141592653589793", for example: a string of digits that a human can read, with in this case 15 decimal digits. In contrast, that same number written to a binary file would be encoded as 8 characters (bytes) that are not readable by a human but which contain the exact same data that the variable pi
had in the computer's memory.
Text mode: occupies more space, precision can be lost (if not all digits are written to disk), but is readable and editable by hand with a text editor. Can only be used for one- and two-dimensional arrays.
Binary mode: compact and exact representation of the data in memory, can't be read or edited by hand. Arrays of any size and dimensionality can be saved and read without loss of information.
In the following examples, we'll only be talking about reading and writing arrays in text mode.
In [85]:
arr3 = np.random.rand(6,5)
print arr3
In [97]:
np.savetxt('arrayFile.txt', arr3, fmt='%.2e', header='')
In [98]:
arr4 = np.loadtxt('arrayFile.txt')
In [99]:
print arr4
In [100]:
## what if we want to skip the first and last rows? Or we just want columns 2 and 3?
arr5 = np.genfromtxt('arrayFile.txt', skip_header=1, skip_footer=1, usecols=(2,3))
In [101]:
print arr5
The matplotlib library is a powerful tool capable that can quickly produce simple plots for data visualization as well as complex publication-quality figures with fine layout control. Here, we will only provide a minimal introduction, but Google is your friend when it comes to creating the perfect plots. The pyplot tutorial (http://matplotlib.org/users/pyplot_tutorial.html) is a great place to get started, and the matplotlib gallery (http://matplotlib.org/gallery.html) has tons of information as well.
Just as we typically use the shorthand np
for Numpy, we will use plt
for the matplotlib.pyplot
module where the plotting functions reside.
In [102]:
import matplotlib.pyplot as plt
%matplotlib inline
To plot a collection of x- and y-values:
In [103]:
x = np.linspace(0, 2*np.pi, 50)
y = np.sin(x)
plt.plot(x, y)
Out[103]:
In [104]:
### If you don't give it x values:
plt.plot(np.random.rand(100))
Out[104]:
In [108]:
### Multiple data sets on the same plot
# the semicolon at the end suppresses the display of some usually unnecessary information
plt.plot(x, np.sin(x), label=r'$\sin(x)$')
plt.plot(x, np.cos(x), label='cos(x)')
plt.xlabel('x values', size=24)
plt.ylabel('y values')
plt.title('two functions')
plt.legend();
In [ ]:
In [ ]:
Notice how matplotlib automatically assigned different colors to the two data sets. We can finetune what these look like:
In [109]:
plt.plot(x, np.sin(x), linewidth=3, color='red', linestyle='--')
Out[109]:
In [112]:
plt.plot(x, np.sin(x), 'o--', markersize=5, color='k')
Out[112]:
In [118]:
### logarithmic plots
x = np.linspace(-5, 5)
y = np.exp(-x**2)
plt.semilogy(x,y, lw=2, color='0.1')
Out[118]:
In [119]:
plt.loglog(x,y)
plt.xlim(1e-1, 1e1)
plt.ylim(1e-2, 1e1)
Out[119]:
In [121]:
# A more complicated example
mu, sigma = 5.9, 0.2
measurements = mu + sigma * np.random.randn(10000)
# the histogram of the data
plt.hist(measurements, 50, normed=False, facecolor='g', alpha=0.75, label='hist')
plt.xlabel('Height (feet)')
plt.ylabel('Number of people')
plt.title('Height distribution')
# This will put a text fragment at the position given:
plt.text(5.1, 600, r'$\mu=100$, $\sigma=15$', fontsize=14)
plt.axis([5,6.7,0,750])
plt.grid(True)
plt.legend()
Out[121]:
In [122]:
### Error bars and subplots
# generate some fake data, with errors in y
x = np.arange(0.2, 4, 0.5)
y = np.exp(-x)
errors = 0.1 + 0.1*np.sqrt(x)
fig = plt.figure()
noErrs = fig.add_subplot(121)
plt.plot(x, y)
plt.xticks((0,1,2,3,4))
noErrs.set_title('No errors')
noErrs.set_ylim(-0.3,1)
withErrs = fig.add_subplot(122)
withErrs.errorbar(x, y, yerr=errors)
withErrs.set_title('With errors')
plt.ylim(-0.3,1)
plt.xticks(np.arange(0,4.5,0.5))
fig.suptitle('fake data', size=16, weight='bold')
Out[122]:
In [123]:
# normal distribution center at x=0 and y=5
x = np.random.randn(100000)
y = np.random.randn(100000)+5
# let's make this plot bigger
plt.figure(figsize=(12,8))
plt.hist2d(x, y, bins=40)
plt.colorbar()
Out[123]:
We want to reproduce the Global Land-Ocean Temperature index plot below (available at this GISS website). Download the data this link and recreate the plot as closely as you can (don't worry about the two green points or matching the font). We've broken up this problem into steps to get you started. Feel free to follow them as you wish, or be a lone wolf and solve this on your own.
BONUS TASKS:
In [ ]: