Plotting and visualization


In [1]:
from IPython.display import display, Image, HTML
from talktools import website, nbviewer

One of the main usage cases for this display architecture is plotting and visualization. In the last two years, there has been an explosion of plotting and visualization libraries in Python and other languages. That has largely been fueled by visualization moving to the web (d3.js) in IPython and other similar environments.

Giving a detailed and thorough overview of visualization in Python would require an entirely separate talk. The purpose here is to show a few of the visualization tools and their integration with the IPython Notebook.

matplotlib

The foundation for plotting and visualization in Python is matplotlib. While there are newer visualizations libraries, almost all of them use matplotlib as a base layer.

IPython has a long history of tight integration with matplotlib. Inline plotting in the Notebook is enabled using %matplotlib inline:


In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import numpy as np

Here is a simple plot from the matplotlib gallery:


In [3]:
# example data
mu = 100 # mean of distribution
sigma = 15 # standard deviation of distribution
x = mu + sigma * np.random.randn(10000)

num_bins = 50
# the histogram of the data
n, bins, patches = plt.hist(x, num_bins, normed=1, facecolor='green', alpha=0.5)
# add a 'best fit' line
y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title(r'Histogram of IQ: $\mu=100$, $\sigma=15$')

# Tweak spacing to prevent clipping of ylabel
plt.subplots_adjust(left=0.15)


mpld3

The d3.js JavaScript library offers a powerful approach for interactive visualization in modern web browsers. The mpld3 Python package adds d3 based rendering to matplotlib. This provides interactivity (pan, zoom, hover, etc.) while maintaining the same matplotlib APIs. These interactive visualizations also display on http://nbviewer.ipython.org.


In [4]:
import mpld3
mpld3.enable_notebook()

Here is an example of a 2d scatter plot with tooltips for each data point:


In [5]:
fig, ax = plt.subplots(subplot_kw=dict(axisbg='#EEEEEE'))
N = 100

scatter = ax.scatter(np.random.normal(size=N),
                     np.random.normal(size=N),
                     c=np.random.random(size=N),
                     s=1000 * np.random.random(size=N),
                     alpha=0.3,
                     cmap=plt.cm.jet)
ax.grid(color='white', linestyle='solid')

ax.set_title("Scatter Plot (with tooltips!)", size=20)

labels = ['point {0}'.format(i + 1) for i in range(N)]
tooltip = mpld3.plugins.PointLabelTooltip(scatter, labels=labels)
mpld3.plugins.connect(fig, tooltip)


Vincent

Vincent is a visualization library that uses the Vega visualization grammar to build d3.js based visualizations in the Notebook and on http://nbviewer.ipython.org. Visualization objects in Vincent utilize IPython's display architecture with HTML and JavaScript representations.


In [6]:
import vincent
import pandas as pd

In [7]:
import pandas.io.data as web
import datetime
all_data = {}
date_start = datetime.datetime(2010, 1, 1)
date_end = datetime.datetime(2014, 1, 1)
for ticker in ['AAPL', 'IBM', 'YHOO', 'MSFT']:
    all_data[ticker] = web.DataReader(ticker, 'yahoo', date_start, date_end)
price = pd.DataFrame({tic: data['Adj Close']
                      for tic, data in all_data.items()})

In [8]:
vincent.initialize_notebook()



In [9]:
line = vincent.Line(price[['AAPL', 'IBM', 'YHOO', 'MSFT']], width=600, height=300)
line.axis_titles(x='Date', y='Price')
line.legend(title='Ticker')
display(line)


Plotly

Plotly

Analyze and Visualize Data, Together.

Plotly is a web-based data analysis and plotting tool that has IPython integration and uses d3.js for its visualizations. It goes beyond plotting and enables the sharing of plots and analyses across a wide range of programming languages (Python, Matlab, R, Julia).


In [10]:
import plotly
py = plotly.plotly('IPython.Demo', '1fw3zw2o13')
nr = np.random

In [11]:
distributions = [nr.uniform, nr.normal , lambda size: nr.normal(0, 0.2, size=size),
                 lambda size: nr.beta(a=0.5, b=0.5, size=size),
                 lambda size: nr.beta(a=0.5, b=2, size=size)]

names = ['Uniform(0,1)', 'Normal(0,1)', 'Normal(0, 0.2)', 'beta(a=0.5, b=0.5)', 'beta(a=0.5, b=2)']

boxes = [{'y': dist(size=50), 'type': 'box', 'boxpoints': 'all', 'jitter': 0.5, 'pointpos': -1.8,
        'name': name} for dist, name in zip(distributions, names)]

layout = {'title': 'A few distributions',
          'showlegend': False,
          'xaxis': {'ticks': '', 'showgrid': False, 'showline': False},
          'yaxis': {'zeroline': False, 'ticks': '', 'showline': False},
          }

py.iplot(boxes, layout = layout, filename='Distributions', fileopt='overwrite')


---------------------------------------------------------------------------
ConnectionError                           Traceback (most recent call last)
<ipython-input-11-e4238c9963b1> in <module>()
     14           }
     15 
---> 16 py.iplot(boxes, layout = layout, filename='Distributions', fileopt='overwrite')

/Users/bgranger/github/ellisonbg/python-api/plotly/plotly.pyc in iplot(self, *args, **kwargs)
     46         def iplot(self, *args, **kwargs):
     47                 ''' for use in ipython notebooks '''
---> 48                 res = self.__callplot(*args, **kwargs)
     49                 width = kwargs.get('width', 600)
     50                 height = kwargs.get('height', 450)

/Users/bgranger/github/ellisonbg/python-api/plotly/plotly.pyc in __callplot(self, *args, **kwargs)
    100 
    101                 origin = 'plot'
--> 102                 r = self.__makecall(args, un, key, origin, kwargs)
    103                 return r
    104 

/Users/bgranger/github/ellisonbg/python-api/plotly/plotly.pyc in __makecall(self, args, un, key, origin, kwargs)
    213                 url = 'https://plot.ly/clientresp'
    214                 payload = {'platform': platform, 'version': __version__, 'args': args, 'un': un, 'key': key, 'origin': origin, 'kwargs': kwargs}
--> 215                 r = requests.post(url, data=payload)
    216                 r.raise_for_status()
    217                 r = json.loads(r.text)

/Users/bgranger/anaconda/lib/python2.7/site-packages/requests/api.pyc in post(url, data, **kwargs)
     86     """
     87 
---> 88     return request('post', url, data=data, **kwargs)
     89 
     90 

/Users/bgranger/anaconda/lib/python2.7/site-packages/requests/api.pyc in request(method, url, **kwargs)
     42 
     43     session = sessions.Session()
---> 44     return session.request(method=method, url=url, **kwargs)
     45 
     46 

/Users/bgranger/anaconda/lib/python2.7/site-packages/requests/sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert)
    454             'allow_redirects': allow_redirects,
    455         }
--> 456         resp = self.send(prep, **send_kwargs)
    457 
    458         return resp

/Users/bgranger/anaconda/lib/python2.7/site-packages/requests/sessions.pyc in send(self, request, **kwargs)
    557 
    558         # Send the request
--> 559         r = adapter.send(request, **kwargs)
    560 
    561         # Total elapsed time of the request (approximately)

/Users/bgranger/anaconda/lib/python2.7/site-packages/requests/adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
    373 
    374         except MaxRetryError as e:
--> 375             raise ConnectionError(e, request=request)
    376 
    377         except _ProxyError as e:

ConnectionError: HTTPSConnectionPool(host='plot.ly', port=443): Max retries exceeded with url: /clientresp (Caused by <class 'socket.gaierror'>: [Errno 8] nodename nor servname provided, or not known)

Other visualization libraries

  • Bokeh: interactive visualization library for large datasets
  • ggplot: Python port of R's ggplot2
  • Seaborn: Statistical data visualization
  • prettyplotlib: Make matplotlib plots that look good