The notebook toolbar is found at the top of the notebook:
At the top left of the toolbar is a logo for the Jupyter project, and the filename of the current notebook (clicking on this allows you to change the notebook's file name). There is also some information about the last time the document was saved or checkpointed.
At the top right of the toolbar is a logo for Python (the current language of the notebook), and a button which, if clicked, will log you out.
Below this is a menu bar which should remind you of other programs, such as Microsoft Word, or Apple Pages.
This provides operations such as saving or loading files, modifying components of the notebook, or even changing the programming language that is to be used by the notebook (the Kernel option). Some of these menu bar options are repeated as shortcuts in the icons of the bottom row of the toolbar.
On the right, there is some information about the notebook.
In the image above, this tells us first that the security connection to the notebook is trusted, then the pencil icon indicates that we are in editing mode. Finally, the current kernel (Python 3 (SfAM)) is named.
Cells can be one of several types - the two main types are Code and Markdown. These are treated differently in the notebook:
Markdown cells contain text to be read by a human, such as this description of a Markdown cell. They can be formatted nicely, like a word-processing document.Code cells contain programming code (for this workshop it will be Python), which will be run by the computer. The code can also be nicely-formatted, as in the cells below.
In [1]:
# This is some example Python code, so we can see a code cell
def hello_world():
"""Say 'hello world'"""
print("Hello World!")
In [2]:
# This is more example Python code, to have a code cell with output
count = 0
for i in range(10):
count = count + i
# Show count as output
count
Out[2]:
Markdown cells look just like regularly-formatted text, but code cells have markers at the left-hand side, indicating whether they are input (In []:) or output (Out []:) cells.
Input cells are where code can be typed and executed. Output cells show the corresponding output from that code. You can edit input cells, but not output cells.
Code cell markers also come with a number, e.g. In [2]: and Out [2]:. This number indicates the order in which cells were run/executed.
A Jupyter notebook can be read like a webpage - this makes it very useful for sharing annotated code and analyses.
However, notebooks are also interactive: you can edit and execute individual cells in any order you like - this makes them very useful for exploring data and experimenting with code.
When you click on a cell, it will be surrounded by a green border if it is in edit mode (i.e. you can change the contents), or a blue border if you cannot currently edit it.
You can move from cell to cell using the up- and down-arrow keys
When you first select a Markdown cell, the border will be blue. To put a Markdown cell into edit mode, double-click on it.
Markdown cell (3min)A new cell can be created in two main ways:
+ toolbar icon. This will insert a new cell below the currently selected cellInsert -> Insert Cell Above or Insert -> Insert Cell Below menu option; this can create a cell below or above the currently selected cell.When you first create a cell, Jupyter will assume that you want a Code cell. This will be shown in the toolbar at the top of the page.
To change the cell type to Markdown, you can click on the dropdown box marked Code and select Markdown.
Jupyter notebooks are a very good way to write literate code, to enable reproducible research, because they make it easier to write a combination of explanatory text and working code in the same document.
Markdown is widely-used in bioinformatics/computing communitiesMarkdown comes in many 'flavours', but has a shared common core set of formatting instructionsMarkdown is plain text, and can be read by humans, even if it is not interpreted/foramtted by a computerMarkdown can be interpreted and rendered as an HTML webpage, an MS Word documents, a PDF file, and so on - write once, render in many formats# Header 1
## Header 2
### Header 3
#### Header 4
which renders as:
*italic*
**bold**
***bold-italic***
* Item 1
* Item 2
* Nested item 1
* Nested item 2
* Item 3
renders as
* [Search on Google](http://www.google.com)
PythonOne of the main principles of bioinformatics is automation - letting the computer do all the repetitive, finicky work, so that you - the scientist - can concentrate on the science.
To instruct the computer to do all that tedious, repetitive work for you, it can be useful to develop some programming skills. Python is widely-used, relatively easy to learn, and comes with a number of highly-useful libraries that are designed for bioinformatics work.
We will use Python to demonstrate how useful it can be to automate bioinformatics tasks, in this workshop.
Python Code#) at the start of the lineAn example of using the cell like a calculator is given below.
In [ ]:
# This cell is being used like a calculator
# You can edit this cell, and run the new calculation with Shift-Enter
(1 + 95) * 1e-2 / 0.65
Code cells can be used to write and execute more complex Python codeIn the example below, a function named calculate_gc() is written. This calculates the GC content of a passed nucleotide sequence. The function is then called on a nucleotide sequence, and the GC content calculated.
In [ ]:
# Define a function that calculates the GC content of a passed sequence
def calculate_gc(sequence):
"""Return the percentage GC content of the passed sequence"""
sequence = sequence.upper() # convert sequence to upper-case
g_plus_c = sequence.count('G') + sequence.count('C')
gc = g_plus_c / len(sequence)
return gc
In [ ]:
# Define a nucleotide sequence
my_sequence = "ctagtcgacgatcatgcagcagctacatcgtagctagcatgctagctagca"
# Calculate the GC content of the sequence
calculate_gc(my_sequence)
Ctrl-S to checkpoint the existing notebookThe Jupyter notebook system allows you to download notebooks in a number of formats, by using the File -> Download as menu bar option.
Alternative formats can have particular advantages, such as:
Notebook (.ipynb): makes an additional copy of the current notebookPython (.py): creates a Python script out of the Code cells in the notebookHTML (.html): creates a read-only HTML version of the notebook that can be shared with others (or placed on a website) and opened in any web browserFile -> New Notebook in the menu barIn either case, selecting Python 3 will give you a new notebook, ready to take input as Markdown or Python code.
In the top right of the Jupyter home page there is a button labelled New. Clicking on this will give you options to create a new file of several types (what is available will depend on your own setup).
In the File -> New Notebook menu option, you will be presented with a (shorter) list of notebook creation options.