00 - Introduction to Jupyter Notebook (25min)

Table of Contents

  1. Introduction
  2. Notebook Structure
    1. Notebook Toolbar
    2. Notebook Space
  3. Cells and Cell Types
  4. Navigating a Notebook
    1. Selecting a Cell
    2. Creating a Cell
    3. Help and Keyboard Shortcuts
  5. Markdown
  6. Python
  7. Saving a Notebook
  8. Creating a New Notebook

1. Introduction

This notebook will guide you through some of the basic operations of a Jupyter Python notebook, such as:

  • Notebook structure
  • Cells and cell types
  • Navigating a notebook
  • What is Markdown?
  • Entering Markdown text.
  • Entering Python code
  • Saving a notebook

2. Notebook structure

A Jupyter notebook can be considered as being divided into two main parts - the toolbar, and the notebook space.

Notebook Toolbar

The notebook toolbar is found at the top of the notebook:

At the top left of the toolbar is a logo for the Jupyter project, and the filename of the current notebook (clicking on this allows you to change the notebook's file name). There is also some information about the last time the document was saved or checkpointed.

At the top right of the toolbar is a logo for Python (the current language of the notebook), and a button which, if clicked, will log you out.

Below this is a menu bar which should remind you of other programs, such as Microsoft Word, or Apple Pages.

This provides operations such as saving or loading files, modifying components of the notebook, or even changing the programming language that is to be used by the notebook (the Kernel option). Some of these menu bar options are repeated as shortcuts in the icons of the bottom row of the toolbar.

On the right, there is some information about the notebook.

In the image above, this tells us first that the security connection to the notebook is trusted, then the pencil icon indicates that we are in editing mode. Finally, the current kernel (Python 3 (SfAM)) is named.

Notebook Space

Below the toolbar is all the text and information, and Python code, of the notebook itself. This is the notebook space.

3. Cells and Cell Types

Jupyter notebooks are built from cells. A cell is an individual part of the notebook that can be run, or executed in isolation.

Cells can be one of several types - the two main types are Code and Markdown. These are treated differently in the notebook:

  • Markdown cells contain text to be read by a human, such as this description of a Markdown cell. They can be formatted nicely, like a word-processing document.
  • Code cells contain programming code (for this workshop it will be Python), which will be run by the computer. The code can also be nicely-formatted, as in the cells below.

In [1]:
# This is some example Python code, so we can see a code cell

def hello_world():
    """Say 'hello world'"""
    print("Hello World!")

In [2]:
# This is more example Python code, to have a code cell with output

count = 0
for i in range(10):
    count = count + i
    
# Show count as output
count


Out[2]:
45

Markdown cells look just like regularly-formatted text, but code cells have markers at the left-hand side, indicating whether they are input (In []:) or output (Out []:) cells.

Input cells are where code can be typed and executed. Output cells show the corresponding output from that code. You can edit input cells, but not output cells.

Code cell markers also come with a number, e.g. In [2]: and Out [2]:. This number indicates the order in which cells were run/executed.

It is possible for code cells to be run in a different order than top-to-bottom in the notebook!

4. Navigating a notebook

A Jupyter notebook can be read like a webpage - this makes it very useful for sharing annotated code and analyses.

However, notebooks are also interactive: you can edit and execute individual cells in any order you like - this makes them very useful for exploring data and experimenting with code.

Selecting a cell

To select on a cell, click on it.

  • When you click on a cell, it will be surrounded by a green border if it is in edit mode (i.e. you can change the contents), or a blue border if you cannot currently edit it.

  • You can move from cell to cell using the up- and down-arrow keys

  • When you first select a Markdown cell, the border will be blue. To put a Markdown cell into edit mode, double-click on it.

Exercise: Edit a Markdown cell (3min)

Edit some text in the `Markdown` cell, below.
When you are finished editing, execute the cell by pressing `Shift-Enter`

This is a Markdown cell to be edited as part of an exercise

"Please edit me. I exist only for you to modify my text until it is in the shape you want. You can experiment with lists of things, like:

  • this thing
  • that thing
  • the other thing

Or you can italicise me. Or make me bold."

Creating a cell

A new cell can be created in two main ways:

  1. Using the + toolbar icon. This will insert a new cell below the currently selected cell
  2. Using the Insert -> Insert Cell Above or Insert -> Insert Cell Below menu option; this can create a cell below or above the currently selected cell.

When you first create a cell, Jupyter will assume that you want a Code cell. This will be shown in the toolbar at the top of the page.

To change the cell type to Markdown, you can click on the dropdown box marked Code and select Markdown.

Exercise: Create a new cell (2min)

Using any of the methods described above, create a new `Markdown` cell below this cell, and enter some text.

Help and Keyboard Shortcuts

When you are not editing a cell, you can use many keyboard shortcuts

  • To bring up the keyboard shortcuts/help menu, hit the h key.

5. Markdown

`Markdown` is a way of formatting text, just like a word processor, but using special symbols to indicate formatting

Jupyter notebooks are a very good way to write literate code, to enable reproducible research, because they make it easier to write a combination of explanatory text and working code in the same document.

In Jupyter notebooks, the `Markdown` formatting syntax is used to make text more presentable for reading.

  • Markdown is widely-used in bioinformatics/computing communities
  • Markdown comes in many 'flavours', but has a shared common core set of formatting instructions
  • Markdown is plain text, and can be read by humans, even if it is not interpreted/foramtted by a computer
  • A sinlge document written in Markdown can be interpreted and rendered as an HTML webpage, an MS Word documents, a PDF file, and so on - write once, render in many formats

Markdown formatting examples

1. Headers

Headers start with one or more hash symbols (`#`) on a new line:

# Header 1
## Header 2
### Header 3
#### Header 4

which renders as:

Header 1

Header 2

Header 3

Header 4

2. Bold and Italic

**Bold** and *italic* (or ***bold-italic***) are obtained by wrapping text in asterisks:

*italic*
**bold**
***bold-italic***

3. Lists

Lists are indicated with asterisks - one for each list entry

* Item 1
* Item 2
  * Nested item 1
  * Nested item 2
* Item 3

renders as

  • Item 1
  • Item 2
    • Nested item 1
    • Nested item 2
  • Item 3

Hyperlinks (e.g. to data or other useful information) are indicated by square-bracketing the text of the link, and putting the URL in parentheses:

* [Search on Google](http://www.google.com)

Exercise: Practise Some Markdown (3min)

Create a new `Markdown` cell below, and try out some formatting.

6. Python

`Python` is a popular and relatively easy-to-learn programming language, widely-used in bioinformatics

One of the main principles of bioinformatics is automation - letting the computer do all the repetitive, finicky work, so that you - the scientist - can concentrate on the science.

To instruct the computer to do all that tedious, repetitive work for you, it can be useful to develop some programming skills. Python is widely-used, relatively easy to learn, and comes with a number of highly-useful libraries that are designed for bioinformatics work.

We will use Python to demonstrate how useful it can be to automate bioinformatics tasks, in this workshop.

You can get a remarkably long way, with only a little programming knowledge, in bioinformatics.

Running Python Code

In a Jupyter notebook, `Python` code is written in a `Code` cell, and run by executing that cell with `Shift-Enter`

  • You can use the code cell like a calculator
  • Comments (plain text that is not executed as code) are written with a hash (#) at the start of the line

An example of using the cell like a calculator is given below.

Exercise: Python as a Calculator (2min)

Edit the cell below to perform a new calculation.


In [ ]:
# This cell is being used like a calculator
# You can edit this cell, and run the new calculation with Shift-Enter

(1 + 95) * 1e-2 / 0.65
  • Code cells can be used to write and execute more complex Python code

In the example below, a function named calculate_gc() is written. This calculates the GC content of a passed nucleotide sequence. The function is then called on a nucleotide sequence, and the GC content calculated.

NOTE: The code in this function is not very good - it doesn't do the necessary checks for a passed sequence being a plausible nucleotide sequence, for instance.


In [ ]:
# Define a function that calculates the GC content of a passed sequence
def calculate_gc(sequence):
    """Return the percentage GC content of the passed sequence"""
    sequence = sequence.upper()    # convert sequence to upper-case
    g_plus_c = sequence.count('G') + sequence.count('C')
    gc = g_plus_c / len(sequence)
    return gc

In [ ]:
# Define a nucleotide sequence
my_sequence = "ctagtcgacgatcatgcagcagctacatcgtagctagcatgctagctagca"

# Calculate the GC content of the sequence
calculate_gc(my_sequence)

Exercise: Editing Python Code (5min)

Create a cell below, and define a new nucleotide sequence, then calculate its GC content.
BONUS QUESTION: What happens if your sequence has characters other than `{a, c, g, t}`?

7. Saving a Notebook

It is important to save your work frequently. You can also save your notebooks in formats that are easy and covenient to share

  • The simplest way to save a notebook is to use Ctrl-S to checkpoint the existing notebook

Downloading a Notebook

The Jupyter notebook system allows you to download notebooks in a number of formats, by using the File -> Download as menu bar option.

Alternative formats can have particular advantages, such as:

  • Notebook (.ipynb): makes an additional copy of the current notebook
  • Python (.py): creates a Python script out of the Code cells in the notebook
  • HTML (.html): creates a read-only HTML version of the notebook that can be shared with others (or placed on a website) and opened in any web browser

8. Creating a New Notebook

There are two main ways to create a new Jupyter notebook.

  • From the Jupyter home page
  • Via File -> New Notebook in the menu bar

In either case, selecting Python 3 will give you a new notebook, ready to take input as Markdown or Python code.

In the Jupyter Home Page

In the top right of the Jupyter home page there is a button labelled New. Clicking on this will give you options to create a new file of several types (what is available will depend on your own setup).

In the Current Notebook

In the File -> New Notebook menu option, you will be presented with a (shorter) list of notebook creation options.