This tutorial was built in a Jupyter notebook! Various formats of this tutorial can be accessed at https://github.com/oxpeter/library_bioinformatics_service/tree/master/Jupyter
Created by Peter Oxley for the library bioinformatics service, May 2017
Installation of Jupyter notebooks is recommended via Anaconda
In [1]:
# this is a code cell with no output
a=120
In [2]:
# this is a code cell with output
# all output to stdout / stderr will be displayed below the cell.
print(a)
It is formatted using markdown syntax.
(to edit a markdown cell, just double click on the text. Don't forget to 'execute' the cell afterwards to implement the formatting)
Markdown cells within a notebook have a number of advantages:
Cells are switched between code and markdown by using the menu
Cell > Cell Type > Markdown
Or by using the dropdown box in the icon bar.
In [3]:
import numpy as np
import pandas as pd
# notice that this cell doesn't execute when you press enter.
# Only by pressing shift-enter or alt-enter, or clicking on the 'run' icon.
In [4]:
# this cell does not generate any output to stdout or stderr,
# so nothing is shown after executing the cell.
s1 = np.random.normal(0,1,1000) # generate a random sample with normal distribution (mean 0, sd 1, 1000 samples)
s2 = np.random.normal(2,4,1000)
df = pd.DataFrame({"s1":s1, "s2":s2})
In [5]:
# this cell outputs to stdout,
# which is printed immediately following the cell:
df.info()
In [6]:
# table output is formatted to make it easy to view:
df.T.head()
Out[6]:
The code in this notebook is executed by the designated "kernel" loaded at creation. In this case, the IPython kernel was loaded. All code entered will therefore be interpreted by this kernel and run as python code. However, when the kernel is IPython, you have access to "cell magic" (using the %
syntax), where it is possible to have cells run by a different interpreter.
%
will run the magic on that line only. %%
will run the magic on the entire cell.
In [7]:
%%bash
# this cell is run in a bash shell created specially for the following code.
echo "Hello, world"
In [8]:
# it is also possible to invoke bash commands using the ```!``` syntax:
!ls -al | head -n 8 | tail -n 2
In [9]:
%%html
<body>
<h2>This is an html interpreted header</h2>
<a href="library.med.cornell.edu">This is an html link</a>
</body>
In [10]:
# capturing the output of the bash ls command:
directory_contents = !ls -la
directory_contents
Out[10]:
In [11]:
%%bash -s "$a"
# The above line puts the variable a into the bash shell as a positional parameter.
# Be aware of any characters (eg. quotation marks) in the python variable -
# these will need to be escaped before being passed to the bash cell.
echo $1
In [12]:
# an alternative to send variables into bash:
!echo {a * 2}
In [13]:
# R requires a few extra steps to access
# rpy2 provides access to R from within Python
# (you can read more here: http://rpy2.readthedocs.io)
# after installing rpy2 - we load the extension into the kernel:
%load_ext rpy2.ipython
# now we can access the installed version of R
iris_dataset = %R iris
In [14]:
iris_dataset.describe()
Out[14]:
In [15]:
%%R -i df
# the above line sets R as the interpreter for this cell,
# and imports the variable df (it will be referenced in this cell using the same name)
# Now we can manipulate and graph the dataframe using R functions:
require(ggplot2)
ggplot(data=df) + geom_point(aes(x=s1, y=s2))
In [16]:
# change the current working directory
%cd jupyterhub/
In [17]:
# list the variables currently available to the kernel
%who
In [18]:
# list the variables and their string representation
%whos
In [19]:
%%time
for i in range(10):
!sleep 1
In [20]:
%%timeit
np.random.normal(0,1,1000).sum()
In [21]:
# to capture plot output and display it inline:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
In [22]:
df['s2'].hist();
plt.show()
In [23]:
# using the question mark will bring up any help documentation
?pd.DataFrame
Works for variables, modules, functions, function parameters, and cell magics.
The markdown box is MathJax aware, so you can do cool things such as: \begin{equation*} \left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right) \end{equation*}
Latex can also be leveraged to export notebooks to pdf
Including .pdf (using Latex), .html, .py, .rst, and .md. Use File > Download as > ...
In [ ]: