Introduction

What's so special about Jupyter notebooks?

  • Jupyter's power is in having code, data, and text all accessible from one file viewed in a browser, living locally or in the cloud.
  • The Jupyter Project used to be part of the IPython Project and has spun off.
  • It supports over 40 different languages (see this article) and that is actively growing.

Text bits

The text in a notebook is written in the simple markdown language, making it a rich format capable of "rendering" latex, links, images and even html (since we're in a browser and the R kernel let's us).

However, the philosophy of markdown is, above all else, to make it readable in either its rendered or plain text format.

The "new" wave on the block (at least R-flavored notebooks)

Amazing fact: in 2014, there were 80,000 jupyter notebooks on github. In 2015 the number almost tripled to 230,000. This shows how popular and fast-growing the usage is in the community.

  • github numbers from article by Alex Perrier here

More specific stats on the percentages of those notebooks which are using R may not be a representative dataset to glean a trend from at this point given the project is still very young.

History

What I learned after reading the IPython introductory paper by Fernando Perez

Originating in 2001, it was created initially by Perez as an enhanced interactive python shell.

IPython was initially created to fill a need for an enhancement to the Python interpreter. Something useful in addition to simply running snippets of code line by line.

How IPython went above and beyond a regular REPL

  1. Retain access to the environment's state
  • Include a control system
  • Perform OS-level actions
  • Perform introspection and help
  • Execute and debug apps within its interface

As an aside, it was so good at OS-level actions that some Windows users adopted it in place of the native command prompt.

When the IPython enhanced interpreter went to a browser-based project doing the same stuff

In December of 2011, Perez, Brian Granger and Min Ragan-Kelley figured out a way to convert the interactive shell into a browser with "cells" where code is executed. Also, there are "cells" specifically for markdown text.

The R kernel is a younger cousing of the python kernel

The R kernel (IRkernel) is a much, much newer project with it's first release of 0.1 on Mar 5, 2015 (source: https://github.com/IRkernel/IRkernel/releases)

  • No magics ((the ability to leverage other languages/process types in IPython) for the R kernel and probably never...they'd have to resolve the modulus to magic indicator conflict (%%)
  • However, R code can be run from within a python flavored notebook using a python library rpy2 and is currently a common option for combining the IPython features with R in academic pubs

The jupyter notebook above can be found at the github repo here. There is also a blog about the idea of pipelining R and python in practice (caveat: don't try pipelining on big data; useful for prototyping or Mb - Gb datasets).

Perspectives

On the jupyter project's hopeful future in data science

"Jupyter is an amazing project that feeds and rides the rising wave of data science."

  • quote from article by Alex Perrier here