Interacting with data


In [1]:
from IPython.display import display, Image, HTML
from talktools import website, nbviewer

Overview

There are important questions we should be about the human experience of working with data.

How many steps are required to:

  • Visualize a dataset?
  • Explore a dataset?
  • Transform, analyze, summarize, model the dataset?
  • How difficult and time consuming are the required steps?
  • What is the visual and cognitive load of the tools we are using?

The number of steps required for each of these activities is primarily determined by the software tools and computing environment:

  • Many difficult, time consuming steps $\rightarrow$ user is far from the data
  • Few easy steps $\rightarrow$ user is close to the data

The modern web+cloud architecture presents a massive problem:

  • The user is sitting in front of a web-browser running on their laptop
  • The web browser has all of the visualization and interactive capabilities we want to leverage in working with the data
  • The data is not in the web browser
  • Any significant computation has to be performed where the data is (not in the browser)

We have a Grand Canyon between humans users and data:


In [2]:
Image("images/grand-canyon.jpg", width="600px")


Out[2]:

Data exploration

Data exploration is an iterative process that involves repeated passes at visualization, interaction and computation:


In [3]:
Image('images/VizInteractCompute.png')


Out[3]:

Right now this cycle is still really painful:

  • It takes too long to go through a single iteration
  • Even when we are successful, the overall process is not reproducible
  • Difficult to repeat, generalize or share with others
  • Massive cognitive load that has nothing to do with extracting insight from the data

For IPython 2.0 we have built an architecture that allows Python and JavaScript to communicate seamlessly and in real time. The result is that users can get close to their data.

What does this look like?

Interact

IPython 2.0 offers an interact function and decorator for interactive exploration. This is the highest-level API in our interactive JavaScript architecture.

Image editing

In this example, we will perform some basic image processing using scikit-image.


In [4]:
from IPython.html.widgets import *

In [5]:
import skimage
from skimage import data, filter, io

In [6]:
i = data.coffee()
io.Image(i)


/Users/bgranger/Documents/Computing/IPython/code/ipython/IPython/core/formatters.py:201: FormatterWarning: Exception in image/jpeg formatter: fileno
  FormatterWarning,
Out[6]:

Here is a function that can apply a gaussian blur and adjust the RGB channels:


In [7]:
def edit_image(image, sigma=0.1, r=1.0, g=1.0, b=1.0):
    new_image = filter.gaussian_filter(image, sigma=sigma, multichannel=True)
    new_image[:,:,0] = r*new_image[:,:,0]
    new_image[:,:,1] = g*new_image[:,:,1]
    new_image[:,:,2] = b*new_image[:,:,2]
    new_image = io.Image(new_image)
    display(new_image)
    return new_image

Calling the function once, displays and returns the edited image:


In [8]:
new_i = edit_image(i, 0.5, r=0.5);


We can use interact to explore the parameter space of the processed image:


In [9]:
lims = (0.0,1.0,0.01)
interact(edit_image, image=fixed(i), sigma=(0.0,10.0,0.1), r=lims, g=lims, b=lims);


We can quickly interate through the visualize, interact, compute cycle.

Lorenz system

Let's explore the Lorenz system of differential equations:

$$ \begin{aligned} \dot{x} & = \sigma(y-x) \\ \dot{y} & = \rho x - y - xz \\ \dot{z} & = -\beta z + xy \end{aligned} $$

This is one of the classic systems in non-linear differential equations. It exhibits a range of different behaviors as the parameters ($\sigma$, $\beta$, $\rho$) are varied.


In [10]:
%matplotlib inline

In [11]:
from IPython.html.widgets import interact, fixed
from IPython.display import clear_output, display, HTML

Here is a Python function that solves the Lorenz systems using SciPy and plots the results using matplotlib:


In [12]:
from lorenz import solve_lorenz

In [13]:
t, x_t = solve_lorenz(N=10, angle=0.0, max_time=4.0, sigma=10.0, beta=8./3, rho=28.0)


Let's use interact to explore this function:


In [14]:
interact(solve_lorenz, angle=(0.,360.), N=(0,50), sigma=(0.0,50.0),
         rho=(0.0,50.0), beta=fixed(8./3));


How does this work?

  • The first argument to interact is a callable/function
  • The keyword arguments to interact are "widget abbreviations"
  • These "widget abbreviations" are converted to Widget instances
  • These Widgets objects are Python objects that are automatically synchronized with JavaScript MVC objects running in the browser.
  • interact simply calls its callable each time any widget changes state

In [15]:
def f(x):
    print x

In [16]:
interact(f, x=True);


True

In [17]:
interact(f, x=(0,10,2));


4

In [18]:
interact(f, x='Hi Strata');


Hi Strata

In [19]:
interact(f, x=dict(this=list, that=tuple, other=str));


<type 'list'>

Architecture overview

The widget architecture in IPython 2.0 has a layered design with each layer working independently of the others.


In [20]:
Image('images/WidgetArch.png')


Out[20]:
  • Interact: High-level interface for quick data exploration of a single callable/function.
  • Widgets: Widgets synchronize reactive Python objects/models with JavaScript models using (Backbone.js) and manage the lifecycle and parent/child relationships of JavaScript/HTML views. This layer is documented in IPEP 23.
  • Comm: The Comm layer allows real-time, asynchronous, bi-directional JSON messaging between Python objects in the kernel and JavaScript in the browser. Comm instances are very lightweight, with each Widget having its own Comm instance. This layer is documented in IPEP 21.
  • WebSockets/ZeroMQ At the lowest level, all Comm's share a pair of WebSocket/ZeroMQ connections to the notebook server and kernel. It is possibly to run all of this through a modern WebSocket aware proxy.

This entire architecture is language agnostic. Other kernel languages (Julia, R, Scala, you name it) will be able to implement the kernel side of the architecture. This will allow all kernels to re-use the JavaScript/HTML/CSS side of the Widgets while having their own language specific APIs for interact/Widgets in the kernels.