This is a live IPython notebook. You can write and test code, annotate it with text (including equations $\hat x = \frac{1}{n}\sum_{i=0}^n x_i$), plot graphs and start external processes. You can even edit this introduction (try double-clicking on this text).
A large number of scientific and data processing libraries are available in this environment; some of these are demoed below.
Code appears on a gray background; text annotations on a white background. The currently selected cell (if any) has an outline drawn around it. Code blocks can be run by clicking on them to select, and then pressing SHIFT-ENTER to run the cell. Any output will be printed below:
In[1]: print "Hello, world!"
Out[1]: Hello, world!
These notes are avaiable at GitHub
The text inside these boxes is Markdown. You can create your own Markdown cells by creating a new cell (Insert/Below Current) and then selection Cell/Cell Type/Markdown (or you can hit Shift-Enter (create new cell) and ESC-M (convert to markdown) to get the same effect). This allows formatted rich text, such as italics, bolding,
bulleted lists (note the space before the * in the source!)
And also verbatim code; to enter code, insert a blank line, and then indent the code with TAB:
print "hello world"
Inline verbatim can be created using backquotes around text, such as x = x + 1
IPython will also recognise LaTeX formulae if you surround them with dollar symbols \$ \$: $ x = x + 1 $ And you can make display equations using double dollar signs: $$ \sum_{i=0}^{N} i^{\alpha_i} + \beta_i $$
IPython extends Python with "magic commands". These are command beginning with %: for example changing the current directory:
%cd /some/path
A useful magic command enables figures to be drawn inline in the notebook (rather than opening as separate windows). To do this use
%matplotlib inline
In [1]:
# note that this is a code cell -- you can execute it with Shift-ENTER
%matplotlib inline
You can also get interactive help on functions, objects in IPython -- just put a question mark after the object in question.
sum?
In [2]:
# try running this cell -- note the pop up at the bottom of the screen
sum?
In [3]:
# if that's not enough, you can get full details on an object with ??
%matplotlib??
You can run python scripts using %run
%run myscript.py
and edit scripts with %edit
%edit myscript.py
If you invoke %edit with no argument, it will open an editor with a blank temporary file -- if you save and exit, IPython will load the contents into a cell and execute it
You can get previous cell inputs using i (the last input) and for the last output Previous values can be accessed via the In[] and Out[] arrays (see the In[n] at the left of each code cell -- you can use this directly)
print In[1]
Let's compute the length of a path in the Collatz sequence. That is, we take a number $n$ and halve it if $n \equiv 0 \mod 2$, otherwise replace it with $3n+1$. We count the number of steps until we reach 1.
Remember that to execute the cell, and thus to define the function, select the cell and press SHIFT-ENTER.
In [4]:
def collatz(n, steps=0):
"""Compute the number of steps to reach 1 in the Collatz sequence.
Note the use of triple quotes to specify a docstring.
Also note the use of a default parameter (steps) to count the number
of recursive calls."""
if n==1:
return steps
if n%2==0:
return collatz(n/2, steps+1)
else:
return collatz(3*n+1, steps+1)
Now try it in the cell below (e.g. entering collatz(331) and pressing SHIFT-ENTER should print 24).
In [5]:
collatz(331)
Out[5]:
OK, let's plot the graph of this function for various $n$. We'll use numpy to manipulate vectors of values, and matplotlib to plot the graph. First we must import them. In future workbooks this will already be done at the start, but we do it explicitly here for clarity.
In [6]:
import numpy as np # np is the conventional short name for numpy
import matplotlib.pyplot as plt # and plt is the conventional name for matplotlib
import seaborn # all this does (in this case) is restyle matplotlib to use better layouts
Now we create an array of integers 1:n and plot it. Note the use of arange to create an array of integers, and the list comprehension [collatz(n) for n in ns], which applies the collatz function to each element of ns.
In [23]:
ns = np.arange(1,500) # generate n = [1,2,3,4,...]
collatzed = np.array([collatz(n) for n in ns]) # apply collatz(n) to each value and put it in a numpy array
plt.plot(ns, collatzed) # plot the result
Out[23]:
If you hit SHIFT-ENTER on the above, you should see a plot.
It looks pretty noisy; perhaps there is some periodic structure. We can use the FFT to look at this. np.fft.fft() computes the http://en.wikipedia.org/wiki/Fourier_transform; we can compute the magnitude spectrum $|f(x)|$ by taking the absolute value, and discarding the symmetric half:
In [24]:
fftd = np.fft.fft(collatzed)
# take absolute value
real_magnitude = np.abs(fftd)
# trim off symmetric part (note the slice syntax)
real_magnitude = real_magnitude[1:len(fftd)/2] # note that we drop the 0th element (DC)
fig = plt.figure() # make a new figure
ax = fig.add_subplot(111) # this just creates a new single blank axis
ax.plot(real_magnitude) # and plots onto it (we can create multi-panel plots using add_subplot)
Out[24]:
That looks pretty unstructured. Let's try more numbers, and make the fft plot a function we can reuse later.
In [18]:
ns = np.arange(1,10000)
collatzed = [collatz(n) for n in ns]
In [25]:
def fft_plot(x):
"""Plot the magnitude spectrum of x, showing only the real, positive-frequency
portion, and excluding component 0 (DC). """
fftd = np.fft.fft(x)
# get absolute (magnitude spectrum)
real_magnitude = np.abs(fftd)
# chop off symmetric part
real_magnitude = real_magnitude[1:len(fftd)/2]
fig = plt.figure() # make a new figure
ax = fig.add_subplot(111)
ax.plot(real_magnitude)
In [26]:
fft_plot(collatzed)
Some interesting structure, with big spikes, but the frequency approach isn't revealing much. Let's investigate the distribution of the variable. We can get a histogram with plt.hist().
In [27]:
# normed forces the frequency axis to sum to 1
plt.hist(collatzed, bins=50, normed=True);
OK, this is more interesting. Let's show a normal fit to the distribution, using maximum likelihood estimation. scipy.stats has the tools we need to do this.
In [28]:
mean, std = np.mean(collatzed), np.std(collatzed)
import scipy.stats as stats # we must import scipy.stats, as we've not used it yet
# np.linspace() linearly spaces points on a range: here 200 points spanning the distribution
pdf_range = np.linspace(np.min(collatzed), np.max(collatzed), 200)
# scipy.stats has many distribution functions, including normal (norm)
pdf = stats.norm.pdf(pdf_range, mean, std)
plt.hist(collatzed, bins=50, normed=True)
plt.plot(pdf_range, pdf, 'g', linewidth=3) # plot using thick green line
Out[28]:
Obviously, this distribution is non-Gaussian, but let's test to make sure. scipy.stats provides many statistical tools, including normality testing. scipy.stats.normaltest gives us a combination of D’Agostino and Pearson’s test.
Note: to see the documentation for normaltest, try clicking at the end of normaltest and press SHIFT-TAB to see the tooltip. Hit the ^ symbol to bring up the full help in a pane below. This works for any function.
In [29]:
import scipy.stats as stats # we must import scipy.stats as we've not used it yet
k2, p = stats.normaltest(collatzed)
print p # p-value, testing if the distribution differs from the normal. p<0.05 suggests it is
We can safely assume this distribution is non-Gaussian.
As an additional measure, we can plot a Q-Q plot, showing the quantiles of the collatzed distribution against the quantiles of a normal distribution. If the distribution is normal, the plot would be show as a straight line.
The Q-Q plot is a very useful way of eyeballing distribution fits.
scipy.stats.probplot() does the job easily:
In [35]:
plt.figure() # don't plot on the same axis as the previous plot
qq = stats.probplot(collatzed, dist="norm", plot=plt) # note the use of "norm" to specify the test distribution
The "wobbliness" of the line indicates that this is not a good distribution match.
In [ ]: