The notebook toolbar is found at the top of the notebook:
At the top left of the toolbar is a logo for the Jupyter project, and the filename of the current notebook (clicking on this allows you to change the notebook's file name). There is also some information about the last time the document was saved or checkpointed.
At the top right of the toolbar is a logo for Python
(the current language of the notebook), and a button which, if clicked, will log you out.
Below this is a menu bar which should remind you of other programs, such as Microsoft Word, or Apple Pages.
This provides operations such as saving or loading files, modifying components of the notebook, or even changing the programming language that is to be used by the notebook (the Kernel
option). Some of these menu bar options are repeated as shortcuts in the icons of the bottom row of the toolbar.
On the right, there is some information about the notebook.
In the image above, this tells us first that the security connection to the notebook is trusted
, then the pencil icon indicates that we are in editing
mode. Finally, the current kernel (Python 3 (SfAM)
) is named.
Cells can be one of several types - the two main types are Code
and Markdown
. These are treated differently in the notebook:
Markdown
cells contain text to be read by a human, such as this description of a Markdown
cell. They can be formatted nicely, like a word-processing document.Code
cells contain programming code (for this workshop it will be Python
), which will be run by the computer. The code can also be nicely-formatted, as in the cells below.
In [1]:
# This is some example Python code, so we can see a code cell
def hello_world():
"""Say 'hello world'"""
print("Hello World!")
In [2]:
# This is more example Python code, to have a code cell with output
count = 0
for i in range(10):
count = count + i
# Show count as output
count
Out[2]:
Markdown
cells look just like regularly-formatted text, but code
cells have markers at the left-hand side, indicating whether they are input (In []:
) or output (Out []:
) cells.
Input cells are where code can be typed and executed. Output cells show the corresponding output from that code. You can edit input cells, but not output cells.
Code cell markers also come with a number, e.g. In [2]:
and Out [2]:
. This number indicates the order in which cells were run/executed.
A Jupyter notebook can be read like a webpage - this makes it very useful for sharing annotated code and analyses.
However, notebooks are also interactive: you can edit and execute individual cells in any order you like - this makes them very useful for exploring data and experimenting with code.
When you click on a cell, it will be surrounded by a green border if it is in edit mode (i.e. you can change the contents), or a blue border if you cannot currently edit it.
You can move from cell to cell using the up- and down-arrow keys
When you first select a Markdown
cell, the border will be blue. To put a Markdown
cell into edit mode, double-click on it.
Markdown
cell (3min)A new cell can be created in two main ways:
+
toolbar icon. This will insert a new cell below the currently selected cellInsert -> Insert Cell Above
or Insert -> Insert Cell Below
menu option; this can create a cell below or above the currently selected cell.When you first create a cell, Jupyter will assume that you want a Code
cell. This will be shown in the toolbar at the top of the page.
To change the cell type to Markdown
, you can click on the dropdown box marked Code
and select Markdown
.
Jupyter notebooks are a very good way to write literate code, to enable reproducible research, because they make it easier to write a combination of explanatory text and working code in the same document.
Markdown
is widely-used in bioinformatics/computing communitiesMarkdown
comes in many 'flavours', but has a shared common core set of formatting instructionsMarkdown
is plain text, and can be read by humans, even if it is not interpreted/foramtted by a computerMarkdown
can be interpreted and rendered as an HTML webpage, an MS Word documents, a PDF file, and so on - write once, render in many formats# Header 1
## Header 2
### Header 3
#### Header 4
which renders as:
*italic*
**bold**
***bold-italic***
* Item 1
* Item 2
* Nested item 1
* Nested item 2
* Item 3
renders as
* [Search on Google](http://www.google.com)
Python
One of the main principles of bioinformatics is automation - letting the computer do all the repetitive, finicky work, so that you - the scientist - can concentrate on the science.
To instruct the computer to do all that tedious, repetitive work for you, it can be useful to develop some programming skills. Python is widely-used, relatively easy to learn, and comes with a number of highly-useful libraries that are designed for bioinformatics work.
We will use Python
to demonstrate how useful it can be to automate bioinformatics tasks, in this workshop.
Python
Code#
) at the start of the lineAn example of using the cell like a calculator is given below.
In [ ]:
# This cell is being used like a calculator
# You can edit this cell, and run the new calculation with Shift-Enter
(1 + 95) * 1e-2 / 0.65
Code
cells can be used to write and execute more complex Python
codeIn the example below, a function named calculate_gc()
is written. This calculates the GC content of a passed nucleotide sequence. The function is then called on a nucleotide sequence, and the GC content calculated.
In [ ]:
# Define a function that calculates the GC content of a passed sequence
def calculate_gc(sequence):
"""Return the percentage GC content of the passed sequence"""
sequence = sequence.upper() # convert sequence to upper-case
g_plus_c = sequence.count('G') + sequence.count('C')
gc = g_plus_c / len(sequence)
return gc
In [ ]:
# Define a nucleotide sequence
my_sequence = "ctagtcgacgatcatgcagcagctacatcgtagctagcatgctagctagca"
# Calculate the GC content of the sequence
calculate_gc(my_sequence)
Ctrl-S
to checkpoint the existing notebookThe Jupyter notebook system allows you to download notebooks in a number of formats, by using the File -> Download as
menu bar option.
Alternative formats can have particular advantages, such as:
Notebook (.ipynb)
: makes an additional copy of the current notebookPython (.py)
: creates a Python
script out of the Code
cells in the notebookHTML (.html)
: creates a read-only HTML version of the notebook that can be shared with others (or placed on a website) and opened in any web browserFile -> New Notebook
in the menu barIn either case, selecting Python 3
will give you a new notebook, ready to take input as Markdown
or Python
code.
In the top right of the Jupyter home page there is a button labelled New
. Clicking on this will give you options to create a new file of several types (what is available will depend on your own setup).
In the File -> New Notebook
menu option, you will be presented with a (shorter) list of notebook creation options.