Getting close to your data with Python and JavaScript


In [1]:
from IPython.display import display, Image, HTML
from talktools import website, nbviewer

Brian E. Granger (@ellisonbg)

Physics Professor, Cal Poly

Core developer, IPython Project

The IPython Project

IPython is an open source, interactive computing environment for Python and other languages.


In [2]:
website('http://ipython.org')


Out[2]:
  • Started in 2001 by Fernando Perez, who continues to lead the project from UC Berkeley
  • Open source, BSD license

Funding

  • Over the past 13 years, much of IPython has been "funded" by volunteer developer time.
  • Past funding: NASA, DOD, NIH, Enthought Corporation
  • Current funding:

Development team

  • IPython is developed by a talented team of $\approx15$ core developers and a larger community of $\approx100$ contributors.
  • Through the above funding sources, there are currently 6 full time people working on IPython at UC Berkeley and Cal Poly.

In [3]:
import ipythonproject

In [4]:
ipythonproject.core_devs()


Fernando Perez

Brian Granger

Min Ragan-Kelley

Thomas Kluyver

Matthias Bussonnier

Jonathan Frederic

Paul Ivanov

Evan Patterson

Damian Avila

Brad Froehle

Zach Sailer

Robert Kern

Jorgen Stenarson

Jonathan March

Kyle Kelley

Notice that the output of the above Python code is an HTML table with embedded images. IPython generalizes the notion of output to include rich formats: HTML, PNG, JPEG, PDF, JavaScript, LaTeX, etc. This means that any Python object can declare rich representations that will be rendered and saved in the notebook.

The IPython Notebook

The IPython Notebook is a web-based interactive computing environment that spans the full range of data related activities:

  1. Individual exploration, analysis and visualization
  2. Debugging, testing
  3. Production runs
  4. Parallel computing
  5. Collaboration
  6. Publication
  7. Presentation
  8. Teaching/Learning

How does IPython target these different activities?

Interactive exploration

The central focus of IPython is the writing and running of code. We try to make this as pleasant as possible:

  • Multiline editing
  • Tab completion
  • Integrated help
  • Syntax highlighting
  • System shell access

Let's download some stock data into a Pandas DataFrame and then visualize the time series using Vincent/Vega/d3.


In [5]:
import vincent
import pandas as pd
vincent.initialize_notebook()



In [6]:
import pandas.io.data as web

all_data = {}
for ticker in ['AAPL', 'GOOG', 'IBM', 'YHOO', 'MSFT']:
    all_data[ticker] = web.get_data_yahoo(ticker, '1/1/2010', '1/1/2013')
price = pd.DataFrame({tic: data['Adj Close'] for tic, data in all_data.items()})

In the Notebook DataFrame objects are represented as formatted HTML tables:


In [7]:
price[0:10]


Out[7]:
AAPL GOOG IBM MSFT YHOO
Date
2010-01-04 205.70 626.75 122.62 27.88 17.10
2010-01-05 206.05 623.99 121.14 27.89 17.23
2010-01-06 202.77 608.26 120.35 27.72 17.17
2010-01-07 202.40 594.10 119.94 27.43 16.70
2010-01-08 203.75 602.02 121.14 27.62 16.70
2010-01-11 201.95 601.11 119.87 27.27 16.74
2010-01-12 199.65 590.48 120.83 27.09 16.68
2010-01-13 202.47 587.09 120.57 27.34 16.90
2010-01-14 201.29 589.85 122.49 27.89 17.12
2010-01-15 197.93 580.00 122.00 27.80 16.82

In [8]:
line = vincent.Line(price[['GOOG', 'AAPL', 'IBM', 'YHOO', 'MSFT']], width=600, height=300)
line.axis_titles(x='Date', y='Price')
line.legend(title='Ticker')
display(line)


Multiple backend languages

Data science is a multi-language activity. R. Python. Julia. Scala. Etc. The IPython architecture is language agnostic.

Let's fit a linear model in R and visualize the results:


In [9]:
import numpy as np
X = np.array([0,1,2,3,4])
Y = np.array([3,5,4,6,7])
%load_ext rmagic

The %%R syntax tells IPython to run the rest of the cell as R code:


In [10]:
%%R -i X,Y -o XYcoef
XYlm = lm(Y~X)
XYcoef = coef(XYlm)
print(summary(XYlm))
par(mfrow=c(2,2))
plot(XYlm)


Call:
lm(formula = Y ~ X)

Residuals:
   1    2    3    4    5 
-0.2  0.9 -1.0  0.1  0.2 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   3.2000     0.6164   5.191   0.0139 *
X             0.9000     0.2517   3.576   0.0374 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7958 on 3 degrees of freedom
Multiple R-squared:   0.81,	Adjusted R-squared:  0.7467 
F-statistic: 12.79 on 1 and 3 DF,  p-value: 0.03739

This %%language syntax is an IPython specific extension to the Python language. This "magic command syntax" allows Python code to call out to a wide range of other languages (Ruby, Bash, Julia, Fortran, Perl, Octave, Matlab, etc.)

Native kernels

In the IPython architecture, the kernel is a separate process that runs the user's code and returns the output back to the frontend (Notebook, Terminal, etc.). Kernels talk to frontends using a well documented message protocol (JSON over ZeroMQ and WebSockets). The default kernel that ships with IPython knows how to run Python code. However, there are now kernels in other languages:

By later this year, all users of the IPython Notebook will have the option to choose what type of kernel to use for each Notebook.

Here is a notebook that runs code in the native Julia kernel:


In [11]:
website("http://nbviewer.ipython.org/url/jdj.mit.edu/~stevenj/IJulia%20Preview.ipynb")


Out[11]:

Notebook documents

Notebook documents are just JSON files stored on your filesystem. These files store everything related to a computation:

  • Code
  • Output (text, HTML, plots, images, JavaScript)
  • Narrative text (Markdown with embedded LaTeX math)

Notebook documents can be shared:

  • GitHub repos
  • Email
  • Dropbox
  • Internal shared file systems

Notebook documents can be viewed by anyone on the web through http://nbviewer.ipython.org


In [12]:
website("http://nbviewer.ipython.org")


Out[12]:

This allows people to compose and share reproducible stories that involve code and data.

Earlier this year, Randall Munroe (xkcd) published a comic about regular expression golf. Peter Norvig from Google wanted to explore some of the algorithms related to this comic and shared his explorations as a notebook on nbviewer:


In [13]:
website("http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb")


Out[13]:

In [ ]: