Seaborn

Library for "making attractive and informative statistical graphics in Python". It is built on top of matplotlib and not as a replacement. When using Seaborn it is common to still use a lot of standard matplotlib commands.

This tutorial "steals" parts from the Seaborn documentation.


In [ ]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Let us first define a function plotting some sine waves so that we have something to plot.


In [ ]:
def sinplot(flip=1):
    x = np.linspace(0, 14, 100)
    for i in range(1, 7):
        plt.plot(x, np.sin(x + i * .5) * (7 - i) * flip)

With pure matplotlib the plot looks somewhat "blunt":


In [ ]:
sinplot()

Seaborn makes it easy to get much nicer plots. All you have to do is import Seaborn:


In [ ]:
import seaborn as sns
sinplot()

Controlling figure aesthetics (aka make your plots look awesome)

Seaborn provides a number of different aesthetics that can be activated with the set function. These are defined by three things:

  1. The context affects things like label and line scaling for different output media. The contexts are "notebook" (default), "paper", "talk", and "poster". This makes it easy to reuse a figure from a paper for a poster with decent font scaling if you properly set your figsize to what the final output size should be. (Figsize is given in inches and the journals usually tell you what size your figures are allowed to be.) If you save your figures as PDF and include them in a LaTeX document you won't need to adjust the scaling anymore if you set it correctly with figsize.
  2. The style affects how the coordinate axes look. Available styles are "darkgrid" (default), "whitegrid", "dark", "white", "ticks".
  3. The palette affects the colors of your plot. There are a number of ways to generate different palettes in seaborn and I suggest checking the documentation because the best palette depends on the type of plot and data you plot.

Let us try out some of the combinations:


In [ ]:
sns.set(context='paper')
sinplot()

In [ ]:
sns.set(style='whitegrid')
sinplot()

In [ ]:
sns.set(style='ticks')
sinplot()

In [ ]:
sns.set(palette='colorblind')
sinplot()

Styles etc. can also temporarilly applied within a with statement.


In [ ]:
with sns.axes_style('ticks'), sns.color_palette('colorblind'):
    sinplot()

Usually such plots look better without the upper and right box boundaries. These can easily be removed with sns.despine().


In [ ]:
sns.set(style='ticks')
sinplot()
sns.despine()

Some plots look even better when offsetting the axes.


In [ ]:
sns.set(style='ticks')
sinplot()
sns.despine(offset=10)

In [ ]:
sns.set()  # Restore Seaborn defaults

Statistical plotting functions

Seaborn not only improves the matplotlib styling, it also provides a number of (mainly statistical) plotting functions that make it easy to do a defined set of plots that are usually much more complicated in matplotlib. The following will just be a spotlight of what is offered by Seaborn. Refer to the documentation for the full glory.

The Seaborn plotting functions make have use of Pandas data frames and will use meta data from those frames to automatically label axes.

Visualizing distributions


In [ ]:
x = np.random.normal(size=100)

In [ ]:
sns.distplot(x)

In [ ]:
sns.distplot(x, hist=False, rug=True)

In [ ]:
mean, cov = [0, 1], [(1, .5), (.5, 1)]
data = np.random.multivariate_normal(mean, cov, 200)
df = pd.DataFrame(data, columns=['x', 'y'])

In [ ]:
sns.jointplot('x', 'y', data=df)

In [ ]:
sns.jointplot('x', 'y', kind='hex', data=df)

In [ ]:
sns.jointplot('x', 'y', kind='kde', data=df)

In [ ]:
iris = sns.load_dataset('iris')
sns.pairplot(iris)

Visualizing linear relationships

Mainly for exploratory analysis, to get actual quantitative measures use statsmodels.


In [ ]:
tips = sns.load_dataset('tips')
sns.lmplot('total_bill', 'tip', data=tips)

In [ ]:
sns.lmplot('size', 'tip', data=tips)

Some options to make such a plot of discrete values nicer:


In [ ]:
sns.lmplot('size', 'tip', data=tips, x_jitter=.05)

In [ ]:
sns.lmplot('size', 'tip', data=tips, x_estimator=np.mean)

More options for fitting polynomials, robust regression (against outliers), logistic regression, and more.

And easy to extent to more complex plots:


In [ ]:
sns.lmplot('total_bill', 'tip', hue='smoker', col='time', row='sex', data=tips, markers=['o', 'x'], palette='Set1')

Plotting categorical data


In [ ]:
sns.stripplot('day', 'total_bill', data=tips)

In [ ]:
sns.stripplot('day', 'total_bill', data=tips, jitter=True)

In [ ]:
sns.swarmplot('day', 'total_bill', data=tips)

In [ ]:
sns.swarmplot('day', 'total_bill', hue='time', data=tips)

In [ ]:
sns.boxplot('day', 'total_bill', hue='sex', data=tips)

In [ ]:
sns.violinplot('day', 'total_bill', hue='sex', data=tips)

In [ ]:
sns.violinplot('day', 'total_bill', hue='sex', data=tips, split=True, inner='stick')

In [ ]:
sns.barplot('day', 'total_bill', hue='sex', data=tips)

In [ ]:
sns.pointplot('day', 'total_bill', hue='time', data=tips, dodge=True)

Data-aware grids

More advanced, but can be pretty powerful. Just one example to show off the possibilities, refer to the Seaborn documentation for an elaborate introduction:


In [ ]:
g = sns.FacetGrid(tips, col='smoker', row='sex', margin_titles=True)
g.map(plt.scatter, 'total_bill', 'tip', marker='s')
for ax in g.axes.flat:
    ax.plot((0, 50), (0, .2 * 50), c='.2', ls='--')
g.set(xlim=(0, 60), ylim=(0, 14))

In [ ]: