Tutorial: Jupyter notebooks


In [ ]:
__author__ = "Lucy Li"
__version__ = "CS224u, Stanford, Spring 2020"

Starting up

This tutorial assumes that you have followed the course setup instructions. This means Jupyter is installed using Conda.

  1. Open up Terminal (Mac/Linux) or Command Prompt (Windows).
  2. Enter a directory that you'd like to have as your Home, e.g., where your cloned cs224u Github repo resides.
  3. Type jupyter notebook and enter. After a few moments, a new browser window should open, listing the contents of your Home directory.
    • Note that on your screen, you'll see something like [I 17:23:47.479 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/. This tells you where your notebook is located. So if you were to accidentally close the window, you can open it again while your server is running. For this example, navigating to http://localhost:8888/ on your favorite web browser should open it up again.
    • You may also specify a port number, e.g. jupyter notebook --port 5656. In this case, http://localhost:5656/ is where your directory resides.
  4. Click on a notebook with .ipynb extension to open it. If you want to create a new notebook, in the top right corner, click on New and under Notebooks, click on Python. If you have multiple environments, you should choose the one you want, e.g. Python [nlu].
    • You can rename your notebook by clicking on its name (originally "Untitled") at the top of the notebook and modifying it.
    • Files with .ipynb are formatted as a JSON and so if you open them in vim, emacs, or a code editor, it's much harder to read and edit.

Jupyter Notebooks allow for interactive computing.

Cells

Cells help you organize your work into manageable chunks.

The top of your notebook contains a row of buttons. If you hover over them, the tooltips explain what each one is for: saving, inserting a new cell, cut/copy/paste cells, moving cells up/down, running/stopping a cell, choosing cell types, etc. Under Edit, Insert, and Cell in the toolbar, there are more cell-related options.

Notice how the bar on the left of the cell changes color depending on whether you're in edit mode or command mode. This is useful for knowing when certain keyboard shortcuts apply (discussed later).

There are three main types of cells: code, markdown, and raw.

Raw cells are less common than the other two, and you don't need to understand them to get going for this course. If you put anything in this type of cell, you can't run it. They are used for situations where you might want to convert your notebook to HTML or LaTeX using the nbconvert tool or File -> Download as a format that isn't .ipynb. Read more about raw cells here if you're curious.

Code

Use the following code cells to explore various operations.

Typically it's good practice to put import statements in the first cell or at least in their own cell.

The square brackets next to the cell indicate the order in which you run cells. If there is an asterisk, it means the cell is currently running.

The output of a cell is usually any print statements in the cell and the value of the last line in the cell.


In [ ]:
import time
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [ ]:
print("cats")
# run this cell and notice how both strings appear as outputs
"cheese"

In [ ]:
# cut/copy and paste this cell
# move this cell up and down
# run this cell
# toggle the output
# toggle scrolling to make long output smaller
# clear the output
for i in range(50): 
    print("cats")

In [ ]:
# run this cell and stop before it finishes
# stop acts like a KeyboardInterrupt
for i in range(50): 
    time.sleep(1) # make loop run slowly
    print("cats")

In [ ]:
# running this cell leads to no output
def function1(): 
    print("dogs")

# put cursor in front of this comment and split and merge this cell.
def function2(): 
    print("cheese")

In [ ]:
function1()
function2()

One difference between coding a Python script and a notebook is how you can run code "out of order" for the latter. This means you should be careful about variable reuse. It is good practice to order cells in the order which you expect someone to use the notebook, and organize code in ways that prevent problems from happening.

Clearing the output doesn't remove the old variable value. In the example below, we need to rerun cell A to start with a new a. If we don't keep track of how many times we've run cell B or cell C, we might encounter unexpected bugs.


In [ ]:
# Cell A
a = []

In [ ]:
# Cell B
# try running this cell multiple times to add more pineapple
a.append('pineapple')

In [ ]:
# Cell C
# try running this cell multiple times to add more cake
a.append('cake')

In [ ]:
# depending on the number of times you ran 
# cells B and C, the output of this cell will 
# be different.
a

Even deleting cell D's code after running it doesn't remove list b from this notebook. This means if you are modifying code, whatever outputs you had from old code may still remain in the background of your notebook.


In [ ]:
# Cell D
# run this cell, delete/erase it, and run the empty cell
b = ['apple pie']

In [ ]:
# b still exists after cell C is gone
b

Restart the kernel (Kernel -> Restart & Clear Output) to start anew. To check that things run okay in the intended order, restart and run everything (Kernel -> Restart & Run All). This is especially good to do before sharing your notebook with someone else.

Jupyter notebooks are handy for telling stories using your code. You can view Pandas DataFrames and plots directly under each code cell.


In [ ]:
# dataframe example
d = {'ingredient': ['flour', 'sugar'], '# of cups': [3, 4], 'purchase date': ['April 1', 'April 4']}
df = pd.DataFrame(data=d)
df

In [ ]:
# plot example
plt.title("pineapple locations")
plt.ylabel('latitude')
plt.xlabel('longitude')
_ = plt.scatter(np.random.randn(5), np.random.randn(5))

Markdown

The other type of cell is Markdown, which allows you to write blocks of text in your notebook. Double click on any Markdown cell to view/edit it. Don't worry if you don't remember all of these things right away. You'll write more code than Markdown essays for this course, but the following are handy things to be aware of.

Headers

You may notice that this cell's header is prefixed with ###. The fewer hashtags, the larger the header. You can go up to five hashtags for the smallest level header.

Here is a table. You can emphasize text using underscores or asterisks. You can also include links.

Markdown Outcome
_italics_ or *italics* italics or italics
__bold__ or **bold** bold or bold
[link](http://web.stanford.edu/class/cs224u/) link
[jump to Cells section](#cells) jump to Cells section

Displaying code

Try removing/adding the python in the code formatting below to toggle code coloring.

if text == code: 
    print("You can write code between a pair of triple backquotes, e.g. ```long text``` or `short text`")

LaTeX

Latex also works: $y = \int_0^1 2x dx$ $$y = x^2 + x^3$$

Quotations

You can also format quotes by putting a ">" in front of each line.

You can space your lines apart with ">" followed by no text.

Lists

There are three different ways to write a bullet list (asterisk, dash, plus):

  • sugar
  • tea
    • earl gray
    • english breakfast
  • cats
    • persian
  • dogs
  • pineapple
  • apple
    • granny smith

Example of a numbered list:

  1. tokens
  2. vectors
  3. relations

Images

You can also insert images:

![alt-text](./fig/nli-rnn-chained.png "Title")

(Try removing the backquotes and look at what happens.)

Dividers

A line of dashes, e.g. ----------------, becomes a divider.


Kernels

A kernel executes code in a notebook.

You may have multiple conda environments on your computer. You can change which environment your notebook is using by going to Kernel -> Change kernel.

When you open a notebook, you may get a message that looks something like "Kernel not found. I couldn't find a kernel matching __. Please select a kernel." This just means you need to choose the version of Python or environment that you want to have for your notebook.

If you have difficulty getting your conda environment to show up as a kernel, this may help.

In our class we will be using IPython notebooks, which means the code cells run Python.

Fun fact: there are also kernels for other languages, e.g., Julia. This means you can create notebooks in these other languages as well, if you have them on your computer.

Shortcuts

Go to Help -> Keyboard Shortcuts to view the shortcuts you may use in Jupyter Notebook.

Here are a few that I find useful on a regular basis:

  • run a cell, select below: shift + enter
  • save and checkpoint: command + S (just like other file types)
  • enter edit mode from command mode: press enter
  • enter command mode from edit mode: esc
  • delete a cell (command mode): select a cell and press D
  • dedent while editing: command + [
  • indent while editing: command + ]

In [ ]:
# play around with this cell with shortcuts
# delete this cell 
# Edit -> Undo Delete Cells
for i in range(10): 
    print("jelly beans")

Shutdown

Notice that when you are done working and exit out of this notebook's window, the notebook icon in the home directory listing next to this notebook is green. This means your kernel is still running. If you want to shut it down, check the box next to your notebook in the directory and click "Shutdown."

To shutdown the jupyter notebook app as a whole, use Control-C in Terminal to stop the server and shut down all kernels.

Extras

These are some extra things that aren't top priority to know but may be interesting.

Checkpoints

When you create a notebook, a checkpoint file is also saved in a hidden directory called .ipynb_checkpoints. Every time you manually save the notebook, the checkpoint file updates. Jupyter autosaves your work on occasion, which only updates the .ipynb file but not the checkpoint. You can revert back to the latest checkpoint using File -> Revert to Checkpoint.

NbViewer

We use this in our class for viewing jupyter notebooks from our course website. It allows you to render notebooks on the Internet. Check it out here.

View -> Cell toolbar

  • Edit Metadata: Modify the metadata of a cell by editing its json representation. Example of metadata: whether cell output should be collapsed, whether it should be scrolled, deletability of cell, name, and tags.
  • Slideshow: For turning your notebook into a presentation. This means different cells fall under slide types, e.g. Notes, Skip, Slide.

More resources

If you click on "Help" in the toolbar, there is a list of references for common Python tools, e.g. numpy, pandas.

IPython website

Markdown basics

Jupyter Notebook Documentation

Real Python Jupyter Tutorial

Dataquest Jupyter Notebook Tutorial

Stack Overflow