Open source, interactive data science and scientific computing across over 40 programming languages. The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more. Read more about Jupyter.
We are concerned, among other things (it is to be hoped), with increasing the capability of our NBIS staff, collaborators, and wider public to reproduce/resell/recycle/hack our work. Like many others, I used Jupyter and similar platforms for several reasons:
Other uses of Jupyter, that are common enough, but I was less concerned with before include:
We shall take those points one by one with some examples.
Read here about the future of the Jupyter Notebook: https://blog.jupyter.org/2016/07/14/jupyter-lab-alpha/. Some of my key remarks:
I don't know the early beginnings, but I first saw notebooks that can compute in an Astronomy lab from the nineties, where they were using Mathematica notebooks for symbolic mathematics computations. Among mathematicians there is a tribe that became hateful of too much symbolic writing, and they were the first enthusiasts. Two major developments happened since: the ability to write code cells and the web.
Jupyter name is an agglutination of Julia, Python and R and is pronounced in reference to a pie rather than a planet. It originates at Berkeley (also home of Julia and Apache Spark) from a project that now is independent, called IPython. IPython is the best interactive shell for Python for scientific computing and now provides one of many kernels for running Python code in Jupyter.
Projects similar to Jupyter:
Don't expect Jupyter to always work. It is very dependent on your ability to have their latest update on the computer, wait for bugs to be solved, and circumvent problems. If you stay on the beaten track the user experience is fabulous, although limited.
Some known limitations:
In [12]:
# Something Lena asked
from IPython.display import Image
Image('https://blog.jupyter.org/content/images/2016/07/jlab-screenshot-nb-con-term-2.png')
Out[12]:
To start using Jupyter please follow this notebook that I use during my Python classes. Or help yourself with Dr Google. THE point about Jupyter is that you can do this thing (called a code cell):
In [1]:
m = "Hello World!"
def f(message):
print(message)
Each code cell will communicate with the others because underneath a notebook is the same shell/kernel/language (IPython/Python in this particular case, but you can just as well use IRkernel/R). If you created a notebook with a Kernel other than IPython you will have* to type the code in the appropriate languages. Some languages (kernels) however are very xenophoric, most notably Python and Julia, so you can call many other languages if you use them.
Note: *A different kernel can be set for a certain code cell.
In [2]:
m
Out[2]:
In [3]:
f(m)
The text cells are commonly typed in Markdown. They can also include html and latex.
Static | |
HTML | TABLE |
Static latex: $$c = \sqrt{a^2 + b^2}$$
Dynamic latex and html:
In [13]:
%%latex
\begin{align}
\nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}}{\partial t} & = \frac{4\pi}{c}\vec{\mathbf{j}} \\
\nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\
\nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t} & = \vec{\mathbf{0}} \\
\nabla \cdot \vec{\mathbf{B}} & = 0
\end{align}
In [4]:
from IPython.core.display import HTML
HTML('<iframe src=http://nbis.se/?useformat=mobile width=700 height=350>')
Out[4]:
In [5]:
%%HTML
<div style="background-color:cyan; border:solid black; width:300px; padding:20px;">
Value for 'foo': <input type="text" id="foo" value="bar"><br>
<button onclick="set_value()">Set Value</button>
</div>
<script type="text/Javascript">
function set_value(){
var var_value = document.getElementById('foo').value;
var command = "foo = '" + var_value + "'";
console.log("Executing Command: " + command);
var kernel = IPython.notebook.kernel;
kernel.execute(command);
}
</script>
Here is how Javascript communicates with Python:
In [6]:
foo
Out[6]:
The examples above use % which is a way to invoke kernel commands known as "magics". Next is a plot example that will use a magic to load matplotlib.pylab module and specify that we want an inline figure rather than a standalone GUI for our plot.
In [8]:
%pylab inline
x = linspace(0, 3*pi, 500)
plot(x, sin(x**2))
title('adjustment to day-night cycle in northern sweden');
In [9]:
%load_ext rpy2.ipython
In [10]:
%%R
plot_r <- function(x) {
p <- plot(x);
print(p);
}
In [11]:
print("Yellow R this is Python can you please plot this array for me?")
import numpy as np
x = np.random.rand(10)
print(x)
%Rpush x
%R plot_r(x)
A notebook can be displayed on the web provided you use a notebook aware server. The default notebook service is called Nbviewer and can be configured to display public notebooks on a web server. It is widely used on Github, and a large part of Jupyter popularity probably came from having the Github exposed notebooks available publicly.
FAQ: What is this Notebook Viewer? IPython Notebook Viewer is a free webservice that allows you to share static html versions of hosted notebook files. If a notebook is publicly available, by giving its url to the Viewer, you should be able to view it. You can also directly browse collections of notebooks in public GitHub repositories, for example the IPython examples.
Here is a lenghty blog post about deploying the notebook on other clouds:
https://blog.ouseful.info/2014/12/12/seven-ways-of-running-ipython-notebooks/
Most often, a collaborator that is not computer savy will need pdf or html conversion, which can be done in batch mode or interactive using the notebook file menu.
For text processing, batch conversion into markdown is also possible. Apart from this, notebooks can also be saved into the native kernel language, such as Python, Julia, R (untested). The native format of a notebook file, .ipynb is a json format.
TODO: will link the future Andersson handouts here.
This is how I use Jupyter most of the time. Taking log notes is not fun, how about taking log notes while you document your ideas while you program? It is possible to add a time log to it, although I personally don't like it and prefer to add a date when I feel like.
It is easier to keep them private because you want the ability to make annotations that would otherwise be difficult to explain. (For example making a note about what a direct repeat is. You might have heard about it before, but now you just wanted to put it there and clarify it. A collaborator or a client however can think you are an unqualified.)
While I do have a few projects where an integrated code editor is essential, I do a lot of the work in these notebook logs, assembling the code only when needed. They are especially good for exploratory studies. At the end I assemble a few presentation notebooks from the logs. Is this a good ideea for reproducible research? In my opinion publicly documenting every little detail is detrimental, I would not try to reproduce a bit of research that is detailed in too many files. But, having the record, even if private, allows someone the get to the details when needed, and that too is reproducible research. I am guessing there is a trade off somewhere between actual work and administration. Are all of Einstein's mind mapping scribbled bits of paper being kept, maybe they are, maybe they aren't public. He had a weird take on what desk order means. What do you think?
Advices:
Examples:
Well if you heard about IPython or Jupyter, it probably is because you have seen courses. My most recent is here: https://github.com/grokkaine/biopycourse/blob/master/Syllabus.ipynb
Here are some links:
My experience:
It is best if you use Docker and a cloud to provide a controlled experience. It is also good to use a single Python distribution. It is good as in any interactive course if the class is not too big. I found that I can't take more than ten without a super drop in the quality of the teaching.
Most clouds today can be configured or natively run Jupyter. Some of Jupyter kernels are especially great in clouds, having parralel computing capabilities. For some, this has some collaboration benefits too.
My use of this feature was mostly to access clouds from my mobile. Having a small kid it is hard to open a laptop do do any work in the evenings or during weekends, so my Nexus 4 became a work tool.
Because of Github integration, this became a popular feature. Notebooks can be converted to ReStructuredText markup (.reST), which feeds into a popular documentation generator called Sphinx, which created a popular fork named Read The Docs, able to regen after every commit.
I liked for example how cobrapy structured documentation. Send me mail if you have a special mention. Here is another, a metagenomic framework http://pythonhosted.org/mgkit/index.html.
coLaboratory is hosted on Jupyter site and currently offers collaborative support through Google Drive. There are other minor developments, if you had some experience with one or another please let me know.
You can always co-edit a git repo hosting notebooks, but hardcore collaborative edits (as in concurential editing of the same code cell) will break the internal json structure of your document. Unless you have a git genius on your sleeve, better find alternatives. Content management is not in the plans neither.
Worth mentioning:
Because the emergent big data architectures have their own notebook implementations, sometimes natively (examples were given) this will be a hard test for Jupyter. The most recent succesful story for Jupyter was Scala and Spark integration among its kernels. Ultimately, Jupyter is powered by Berkeley, "love" and open source.
The main website has a small example of Jupyter-Spark integration, and other examples that you can test-run at will. https://try.jupyter.org/
Let's say you made a detailed notebook making a number of plots and you need to rerun it because you got some new data. Notebooks can be re-run at any moment from a terminal or from any job scheduler. An example is here: https://blog.dominodatalab.com/lesser-known-ways-of-using-notebooks/
In [ ]: