The Jupyter Notebook has two primary types of cells "Markdown" cells for text (like this one) and "Code" cells for running python code. The cell below this one is a code cell that loads the plotting functions into the plt namespace and loads several functions from the numpy library. The last line requests that all plots show up inline in the notebook (instead of in other windows or as files on your computer).
In [1]:
import matplotlib.pyplot as plt
from numpy import array, sin, sqrt, dot, outer
%matplotlib inline
Arrays of data, and their average and sum in Python. We use some definitions from numpy. Notice the way these operators can be applied to arrays (with the "." operator).
In [2]:
x = array([1,2,3])
In [3]:
x.sum()
Out[3]:
In [4]:
x.mean()
Out[4]:
The formal definition (and to make sure we match with the book) is to take the sum and divide by the number of items in the sample:
In [5]:
x.sum()/len(x)
Out[5]:
In [6]:
x**2
Out[6]:
In [7]:
(x**2).sum() # Note the parenthesis!
Out[7]:
In [8]:
(x**2).sum()/len(x)
Out[8]:
In [9]:
(x**2).mean()
Out[9]:
In [10]:
sin(x)
Out[10]:
In [11]:
sin(x).sum()/len(x)
Out[11]:
In [12]:
sin(x).mean()
Out[12]:
In [13]:
x.var() # Variance
Out[13]:
In [14]:
x.std() # Standard Deviation
Out[14]:
We can test if two quanties are equal with the == operator. Not the same as = since the = is an assignment operator. This will trip you up if you are new to programming, but you'll get over it.
In [15]:
x.std()**2 == x.var() # Related by a square root
Out[15]:
In [16]:
x_m = array([9,5,25,23,10,22,8,8,21,20])
In [17]:
x_m.mean()
Out[17]:
In [18]:
(x_m**2).sum()/len(x_m)
Out[18]:
In [19]:
sqrt(281.3 - (15.1)**2)
Out[19]:
In [20]:
x_m.std()
Out[20]:
Close enough!
This is an illustration of how to implement the histogram from Example 1.2 in the text. Note the use of setting the number of bins. The hist command will pick for you, and you should try other values to see the impact. There is no one correct value, but the too many bins doesn't illustrate clusters of data, and too-few bins tends to oversimplify the data.
In [21]:
n, bins, patches = plt.hist(x_m,bins=7)
The hist function has several possible arguments, we use bins=7 to match the example.
In [28]:
# an array of the counts in each bin:
n
Out[28]:
In [29]:
n/10.0*array([6,9,12,15,18,21,24]) # counts times each bin-center value
Out[29]:
In [30]:
# sum of the last cell should be the mean:
sum(_)
Out[30]:
In [32]:
n/10.0*array([6,9,12,15,18,21,24])**2 # counts times each bin-center value
Out[32]:
In [33]:
# sum of the last cell should be the second moment:
sum(_)
Out[33]:
Both of these results are close to the previous value, but not exact. Remember, the historgram is a representation of the data and the agreement will improve for larger data sets.
In [27]:
rvec = array([1,2]) # A row vector
In [28]:
rvec
Out[28]:
In [29]:
cvec = array([[1],[2]]) # A column vector
In [30]:
cvec
Out[30]:
In [31]:
cvec*rvec # Actually the outer product:
Out[31]:
In [32]:
rvec*cvec # still the outer product... so this simple `*` doesn't respect the rules of linear algebra!
Out[32]:
The dot function properly computes the dot product that we know and love from workshop physics:
In [33]:
dot(rvec,cvec)
Out[33]:
In [34]:
outer(cvec,rvec)
Out[34]:
In [35]:
dot(cvec,rvec) # This doesn't work, because `dot` knows what shape the vectors should be
This probably isn't your first error message, but it's important to look at what Python is telling you. First, it lists the type of error (ValueError). Then it shows where the error occured (in the dot function). This is helpful when you have larger cells with more lines in them. Obviously we already know what line caused this error since there is only one. Finally, the error is explained as follows: the shapes are not aligned, meaning the vectors don't have the right dimensions for a dot product. Some error messages are more helpful than others, but they all look like this. An error and a traceback.