Week 1 - Jupyter basics

Tuesday, 2016-08-30

Jupyter is a web-based notebook system with a rich ecosystem of related tools around it. You are looking at the core tool, the notebook, right now.

A Jupyter notebook is a document you access primarily through the web. It comprises cells which may include executable code or markup text. This paragraph is marked up text, and we'll look at that more closely in a moment.

Let's look at the code first.

Executable code cells

A quick example:


In [2]:
print('Hello, world!')


Hello, world!

In the cell above you can see a brief snippet of python code, followed by the output it created after being executed. It is an executable code cell. The code is executed by hitting the "play one step" button above. The output only appears after you execute it, and even after you execute it you can change the code and execute it again.

Look at the information to the left of the cell: it says In [1]: or something similar. The number inside the brackets indicates that the cell has been executed, and the cardinality of the number indicates which cells were executed in which order.

To demonstrate, here comes another code cell. We'll execute one, and then the other. Watch the numbers appear and change as we do.


In [3]:
print('Hi back!')


Hi back!

See how that works?

It's important to keep the order of execution in mind. It is typical to execute cells in order, but when you work in a notebook, sometimes you go back and make changes. Just like in a paper notebook! If you do that, and execute the cells out of order, you might get different results.

For example:


In [21]:
x = 1

In [22]:
x


Out[22]:
1

In [23]:
x += 1

In [24]:
x


Out[24]:
2

How Jupyter talks to Python

Jupyter doesn't have Python running inside it within your web page. Instead, it talks to Python running on the backend, called a kernel. Here's a detailed explanation, with diagrams of how it works.

Because of this component architecture, it's easy to swap in different kernels. For example, you can use either Python 2 or Python 3 kernels. (In this class, we'll use Python 3.) You can also use an R kernel and other languages. It's pretty nifty.

Text markup w/Markdown cells

These headlines and paragraphs that include text and formatting but not executable code are called Markdown cells. Markdown (follow links to John Gruber's original site if you're interested) is an easy to use system of formatting text (or "marking it up") to be rendered into HTML. You can create things like:

Titles

At

Various

Levels

Of
Depth
  • And
  • Bullet
  • Lists

Just for fun, here's the Markdown that created that marked up text above.

## Text markup w/Markdown cells

These headlines and paragraphs that include text and formatting but not executable code are called Markdown cells.  [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) (follow links to John Gruber's original site if you're interested) is an easy to use system of formatting text (or "marking it up") to be rendered into HTML.  You can create things like:

# Titles
## At
### Various
#### Levels
##### Of
###### Depth

* And
* Bullet
* Lists

See how easy it is?

To create a Markdown cell, just toggle the toolbar up above from Code to Markdown. When you've marked up your text to your liking, execute it just like you would execute a code cell.

If you want to go back and change a cell you've already executed, just click on it.

Keyboard shortcuts

I'm more of a typer than a clicker, so if you're like me and you don't want to click around all the time, you'll want to learn these handy shortcuts.

  • up-arrow - move up a cell
  • down-arrow - move down a cell
  • return - select the current cell for editing
  • ctrl-return - execute the current cell
  • shift-return - execute the current cell, and move to the next cell (create a new one, too, if needed)
  • esc, a - add a new cell above the currently selected cell
  • esc, b - add a new cell below the currently selected cell
  • esc, m - convert the current cell type to Markdown

That's the short list of what I use all the time. There's a full list under Help -> Keyboard shortcuts.

Another handy tool is the command palette, which you can activate by clicking the little keyboard button above. Do that, then type in "command palette" to get a feel for how it works - it's a quick tool for accessing all Jupyter functions by name, and it shows their keyboard shortcuts, too.

More interface details

Some additional points to note:

  • The title of the current notebook is at the top. You can click it to change it.
  • The notebook will be saved automatically to disk on the server from time to time, and you can save it yourself using the menu or the save button.
  • At the top right it should say Python 3. This indicates that you are connected to a Python 3 kernel.
  • Sometimes you will want to "clean up" a notebook to run its cells again from the top. Experiment with the Cell menu items to see your options.
  • Sometimes a connection with the kernel will go bad. This can happen for a number of reasons, and while it shouldn't happen often, it happens enough that you should experiment with the Kernel menu as well to see your options.

Access to other tools

Editing a notebook is not the only thing you can do in Jupyter. You can also open a terminal, and edit a text file.

(Go back to the first Jupyter tab and start a terminal, and open a text file.)

When you're done with a terminal or a text file, you can just close those tabs. The terminal and text files you opened will close when you do.

Shelling out

For your first exercise and the next few weeks, you'll mostly want to communicate with the shell using shell commands, not using Python. To do that, just place an exclamation point (!) in front of the command, like this:


In [25]:
!pwd


/Users/dchud/Documents/Dropbox/teach/2016-fall-data-management/lectures

In [31]:
!ls -a


.                      .ipynb_checkpoints
..                     week-01-20160830.ipynb

In [27]:
!whoami


dchud

In [28]:
!date


Tue Aug 30 15:27:50 EDT 2016

Every one of those cells above executed in the shell, like in the terminal, not within the Python kernel. It can be a little confusing at first, but you'll get the hang of it. Just to be clear, you can also access the same information with Python, e.g.:


In [30]:
import os
os.listdir()


Out[30]:
['.ipynb_checkpoints', 'week-01-20160830.ipynb']

Export and upload

An important function in Jupyter - or at least one we'll use a lot this semester - is the ability to upload and export notebooks and other files. You can upload a notebook to Jupyter using the Upload button at the top right of the file listing, just next to the New button where you can start a new notebook.

You can download or export a notebook from within the notebook by using File -> Download as -> Notebook or any of the other options.

In our class, you'll often download a notebook from GitHub, upload it to a Jupyter instance, run the notebook, and export it again to store a local copy or send it back to GitHub. You'll get used to this, too!

Our class notebook server

I have set up a Jupyter server for you to use at datanotebook.org. This is a bank of servers running on Amazon Web Services (AWS) just for you. They are configured to host multiple notebooks at once, and to start more servers to host more notebook when many of you are using notebooks at once, such as during class.

Important: when you use datanotebook.org, your Jupyter instance is always temporary. It will go away. It is designed this way to handle 50 students using notebooks at once without costing your instructor a fortune. Because of this, when you are working on an assignment, be sure to save your work regularly by downloading it to your local machine. Probably a good habit would be to save a copy every 5-10 minutes.

If you leave your notebook open for more than a half hour without making any changes, the server may expire, and you can lose your work.

Similarly, if you use a notebook for more than three hours, the server may expire.

The lesson is: always save your work. Save a copy locally, and when you're ready, upload it to GitHub.

You are the first people to test this out. It might not work! Please be patient if it doesn't. Let me know what you see going wrong and I'll do what I can to adjust the configuration so it doesn't happen again.