Jupyter notebook tutorial introduction

The goal of this short tutorial is to introduce some of the basics of using a Jupyter notebook. Looking at the official Jupyter documentation can also be helpful.

Some materials in this tutorial were modified from materials of the Caltech course Data Analysis in the Biological Sciences taught by Justin Bois http://bebi103.caltech.edu/2015/tutorials.html

What is Jupyter?

Jupyter is a way to combine text and code (which runs and can display graphic output) in an easy-to-read document that renders in a web browser. The notebook itself is stored as a text file in JSON format. The notebook can be shared as a static document or as a document that can be run with interactive code.

Many different types of programming languages can be run within a Jupyter notebook. We will be using the language Python given that it provides flexible and powerful tools for data analysis and plotting and is the language used in PmagPy.

Launching a Jupyter notebook

To launch a Jupyter notebook, you can do the following.

  • Mac: Use the Anaconda launcher and select Jupyter notebook or open a notebook in Canopy.
  • Windows: Under "Search programs and files" from the Start menu, type jupyter notebook and select "Jupyter notebook."

A Jupyter notebook will then launch in your default web browser.

You can also launch Jupyter from the command line. To do this, type:

jupyter notebook

on the command line and hit enter. This approach also allows for greater flexibility, as you can launch Jupyter with command line flags. For example, Jupyter could be launched specifying the browser to be Safari:

jupyter notebook --browser=safari

This command fires up Jupyter with Safari as the browser. If you launch Jupyter from the command line, your shell will be occupied with Jupyter and will occasionally print information to the screen.

When you launch Jupyter, you will be presented with a menu of files in your current working directory to choose to edit. You can also navigate around the files on your computer to find a file you wish to edit by clicking the "Upload" button in the upper right corner. You can also click "New" in the upper right corner to get a new Jupyter notebook. After selecting the file you wish to edit, it will appear in a new window in your browser, beautifully formatted and ready to edit.

Cells

A Jupyter notebook consists of cells. The two main types of cells you will use are code cells and markdown cells, and we will go into their properties in depth momentarily. First, an overview.

A code cell contains actual code that you want to run. You can specify a cell as a code cell using the pulldown menu in the toolbar in your Jupyter notebook. Otherwise, you can can hit esc and then y (denoted "esc, y") while a cell is selected to specify that it is a code cell. Note that you will have to hit enter after doing this to start editing it.

If you want to execute the code in a code cell, hit "shift + enter." Note that code cells are executed in the order you execute them. That is to say, the ordering of the cells for which you hit "shift + enter" is the order in which the code is executed. If you did not explicitly execute a cell early in the document, its results are now known to the Python interpreter.

Markdown cells contain text. The text is written in markdown, a lightweight markup language. You can read about its syntax here. Note that you can also insert HTML or $\LaTeX$ expressions into markdown cells, and they will be rendered properly.

As you are typing the contents of these cells, the results appear as text. Hitting "shift + enter" renders the text in the formatting you specify. You can specify a cell as being a markdown cell in the Jupyter toolbar, or by hitting "esc, m" in the cell. Again, you have to hit enter after using the quick keys to bring the cell into edit mode.

In general, when you want to add a new cell, you can use the "Insert" pulldown menu from the Jupyter toolbar. The shortcut to insert a cell below is "esc, b" and to insert a cell above is "esc, a." Alternatively, you can execute a cell and automatically add a new one below it by hitting "alt + enter."

Code cells

Below is an example of a code cell printing hello, world. Notice that the output of the print statement appears in the same cell, though separate from the code block.


In [1]:
print('hello, world.')


hello, world.

If you evaluate a Python expression that returns a value, that value is displayed as output of the code cell. This only happens, however, for the last line of the code cell.


In [2]:
# Would show 9 if this were the last line, but it is not, so shows nothing
4 + 5

# I hope we see 11.
5 + 6


Out[2]:
11

Note, however, if the last line does not return a value, such as if we assigned a variable, there is no visible output from the code cell.


In [3]:
a = 5 + 6

However, now if we type in the variable, its value will be displayed.


In [4]:
a


Out[4]:
11

Import packages for scientific computing

One of the things that makes Python so powerful for science are the plethora of packages for scientific computing. However, the need to import these packages and understand what they are is also confusing to new users.

Usually at the top of a notebook, you should put in a code cell that imports all the modules we'll need. For this notebook, we will import numpy so that we can use its numerical structures, the scipy.integrate module and matplotlib.pyplot for plotting.


In [5]:
import numpy as np
import scipy.integrate
import matplotlib.pyplot as plt

Display of graphics

When displaying graphics, you should have them inline, meaning that they are displayed directly in the Jupyter notebook and not in a separate window. You can specify that, as in the cell below, using the %matplotlib inline magic function.

Generally, I prefer presenting graphics as scalable vector graphics (SVG). Vector graphics are infinitely zoom-able; i.e., the graphics are represented as points, lines, curves, etc., in space, not as a set of pixel values as is the case with raster graphics (such as PNG). By default, graphics are displayed as PNGs, but you can specify SVG as I have at the top of this document in the first code cell.

%config InlineBackend.figure_formats = {'svg',}

If SVG graphics aren't working in your browser PNG graphics at a high resolution can be used instead

%config InlineBackend.figure_formats = {'png', 'retina'}

In [6]:
%matplotlib inline

%config InlineBackend.figure_formats = {'svg',}
#%config InlineBackend.figure_formats = {'png', 'retina'}

Example plot

We can generate some data to plot using the np.linspace function and then feeding that data into the np.exp function.


In [18]:
x = np.linspace(0, 2 * np.pi, 200)
y = np.exp(np.sin(np.sin(x)))

These data can then be plotted as below with axes then being labeled.


In [19]:
plt.plot(x, y)
plt.xlim((0, 2 * np.pi))
plt.xlabel('$x$')
plt.ylabel('$\mathrm{e}^{\sin{x}}$')
plt.title('Example plot')
plt.show()


An example function

Below is an example of developing a function and the using it to generate a plot of the Lorenz attractor (which is choses as an example just because it is fun). When you define a function, you should make sure it is properly commented with descriptive doc strings.

Generally, it is a good idea to keep cells simple. You can define one function, or maybe two or three closely related functions, in a single cell, and that's about it.

We will use scipy.integrate.odeint to numerically integrate the Lorenz attractor. We therefore first define a function that returns the right hand side of the system of ODEs that define the Lorentz attractor.


In [20]:
def lorenz_attractor(r, t, p):
    """
    Compute the right hand side of system of ODEs for Lorenz attractor.
    
    Parameters
    ----------
    r : array_like, shape (3,)
        (x, y, z) position of trajectory.
    t : dummy_argument
        Dummy argument, necessary to pass function into 
        scipy.integrate.odeint
    p : array_like, shape (3,)
        Parameters (s, k, b) for the attractor.
        
    Returns
    -------
    output : ndarray, shape (3,)
        Time derivatives of Lorenz attractor.
        
    Notes
    -----
    .. Returns the right hand side of the system of ODEs describing
       the Lorenz attractor.
        x' = s * (y - x)
        y' = x * (k - z) - y
        z' = x * y - b * z
    """
    # Unpack variables and parameters
    x, y, z = r
    s, p, b = p
    
    return np.array([s * (y - x), 
                     x * (p - z) - y, 
                     x * y - b * z])

With this function in hand, we just have to pick our initial conditions and time points, run the numerical integration, and then plot the result.


In [21]:
# Parameters to use
p = np.array([10.0, 28.0, 8.0 / 3.0])

# Initial condition
r0 = np.array([0.1, 0.0, 0.0])

# Time points to sample
t = np.linspace(0.0, 80.0, 10000)

# Use scipy.integrate.odeint to integrate Lorentz attractor
r = scipy.integrate.odeint(lorenz_attractor, r0, t, args=(p,))

# Unpack results into x, y, z.
x, y, z = r.transpose()

# Plot the result
plt.plot(x, z, '-', linewidth=0.5)
plt.xlabel(r'$x(t)$', fontsize=18)
plt.ylabel(r'$z(t)$', fontsize=18)
plt.title(r'$x$-$z$ proj. of Lorenz attractor traj.')
plt.show()



In [ ]: