.. _axis_grids: .. currentmodule:: seaborn

Plotting on data-aware grids

When exploring medium-dimensional data, a useful approach is to draw multiple instances of the same plot on different subsets of your dataset. This technique is sometimes called either "lattice", or `"trellis" `_ plotting, and it is related to the idea of `"small multiples" `_. It allows a viewer to quickly extract a large amount of information about complex data. Matplotlib offers good support for making figures with multiple axes; seaborn builds on top of this to directly link the structure of the plot to the structure of your dataset. To use these features, your data has to be in a Pandas DataFrame and it must take the form of what Hadley Whickam calls `"tidy" data `_. In brief, that means your dataframe should be structured such that each column is a variable and each row is an observation. For advanced use, you can use the objects discussed in this part of the tutorial directly, which will provide maximum flexibility. Some seaborn functions (such as :func:`lmplot`, :func:`factorplot`, :func:`pairplot`, and :func:`jointplot`) also use them behind the scenes. Unlike other seaborn functions that are "Axes-level" and draw onto specific (possibly already-existing) matplotlib ``Axes`` without otherwise manipulating the figure, these higher-level functions create a figure when called and are generally more strict about how it gets set up. In some cases, arguments either to those functions or to the constructor of the class they rely on will provide a different interface attributes like the figure size, as in the case of :func:`lmplot` where you can set the height and aspect ratio for each facet rather than the overall size of the figure. Any function that uses one of these objects will always return it after plotting, though, and most of these objects have convenience methods for changing how the plot, often in a more abstract and easy way.

In [ ]:
%matplotlib inline

In [ ]:
import numpy as np
import pandas as pd
import seaborn as sns
from scipy import stats
import matplotlib as mpl
import matplotlib.pyplot as plt

In [ ]:
sns.set(style="ticks")
np.random.seed(sum(map(ord, "axis_grids")))
.. _facet_grid: Subsetting data with :class:`FacetGrid` --------------------------------------- The :class:`FacetGrid` class is useful when you want to visualize the distribution of a variable or the relationship between multiple variables separately within subsets of your dataset. A :class:`FacetGrid` can be drawn with up to three dimensions: ``row``, ``col``, and ``hue``. The first two have obvious correspondence with the resulting array of axes; think of the hue variable as a third dimension along a depth axis, where different levels are plotted with different colors. The class is used by initializing a :class:`FacetGrid` object with a dataframe and the names of the variables that will form the row, column, or hue dimensions of the grid. These variables should be categorical or discrete, and then the data at each level of the variable will be used for a facet along that axis. For example, say we wanted to examine differences between lunch and dinner in the ``tips`` dataset. Additionally, both :func:`lmplot` and :func:`factorplot` use this object internally, and they return the object when they are finsihed so that it can be used for further tweaking.

In [ ]:
tips = sns.load_dataset("tips")

In [ ]:
g = sns.FacetGrid(tips, col="time")
Initializing the grid like this sets up the matplotlib figure and axes, but doesn't draw anything on them. The main approach for visualizing data on this grid is with the :meth:`FacetGrid.map` method. Provide it with a plotting function and the name(s) of variable(s) in the dataframe to plot. Let's look at the distribution of tips in each of these subsets, using a histogram.

In [ ]:
g = sns.FacetGrid(tips, col="time")
g.map(plt.hist, "tip");
This function will draw the figure and annotate the axes, hopefully producing a finished plot in one step. To make a relational plot, just pass multiple variable names. You can also provide keyword arguments, which will be passed to the plotting function:

In [ ]:
g = sns.FacetGrid(tips, col="sex", hue="smoker")
g.map(plt.scatter, "total_bill", "tip", alpha=.7)
g.add_legend();
There are several options for controlling the look of the grid that can be passed to the class constructor.

In [ ]:
g = sns.FacetGrid(tips, row="smoker", col="time", margin_titles=True)
g.map(sns.regplot, "size", "total_bill", color=".3", fit_reg=False, x_jitter=.1);
Note that ``margin_titles`` isn't formally supported by the matplotlib API, and may not work well in all cases. In particular, it currently can't be used with a legend that lies outside of the plot. The size of the figure is set by providing the height of the facets and the aspect ratio:

In [ ]:
g = sns.FacetGrid(tips, col="day", size=4, aspect=.5)
g.map(sns.barplot, "sex", "total_bill");
With versions of matplotlib > 1.4, you can pass parameters to be used in the `gridspec` module. The can be used to draw attention to a particular facet by increasing its size. It's particularly useful when visualizing distributions of datasets with unequal numbers of groups in each facet.

In [ ]:
titanic = sns.load_dataset("titanic")
titanic = titanic.sort("deck")
g = sns.FacetGrid(titanic, col="class", sharex=False,
                  gridspec_kws={"width_ratios": [5, 3, 3]})
g.map(sns.boxplot, "deck", "age")
By default, the facets are plotted in the sorted order of the unique values for each variable, but you can specify an order:

In [ ]:
days = ["Thur", "Fri", "Sat", "Sun"]
g = sns.FacetGrid(tips, row="day", hue="day", palette="Greens_d",
                  size=1.7, aspect=4, hue_order=days, row_order=days)
g.map(sns.distplot, "total_bill");
Any seaborn color palette (i.e., something that can be passed to :func:`color_palette()` can be provided. You can also use a dictionary that maps the names of values in the ``hue`` variable to valid matplotlib colors:

In [ ]:
pal = dict(Lunch="seagreen", Dinner="gray")
g = sns.FacetGrid(tips, hue="time", palette=pal, size=5)
g.map(plt.scatter, "total_bill", "tip", s=50, alpha=.7, linewidth=.5, edgecolor="white")
g.add_legend();
You can also let other aspects of the plot vary across levels of the hue variable, which can be helpful for making plots that will be more comprehensible when printed in black-and-white. To do this, pass a dictionary to ``hue_kws`` where keys are the names of plotting function keyword arguments and values are lists of keyword values, one for each level of the hue variable.

In [ ]:
g = sns.FacetGrid(tips, hue="sex", palette="Set1", size=5, hue_kws={"marker": ["^", "v"]})
g.map(plt.scatter, "total_bill", "tip", s=100, linewidth=.5, edgecolor="white")
g.add_legend();
If you have many levels of one variable, you can plot it along the columns but "wrap" them so that they span multiple rows. When doing this, you cannot use a ``row`` variable.

In [ ]:
attend = sns.load_dataset("attention").query("subject <= 12")
g = sns.FacetGrid(attend, col="subject", col_wrap=4, size=2, ylim=(0, 10))
g.map(sns.pointplot, "solutions", "score", color=".3", ci=None);
Once you've drawn a plot using :meth:`FacetGrid.map` (which can be called multiple times), you may want to adjust some aspects of the plot. There are also a number of methods on the :class:`FacetGrid` object for manipulating the figure at a higher level of abstraction. The most general is :meth:`FacetGrid.set`, and there are other more specialized methods like :meth:`FacetGrid.set_axis_labels`. For example:

In [ ]:
with sns.axes_style("white"):
    g = sns.FacetGrid(tips, row="sex", col="smoker", margin_titles=True, size=2.5)
g.map(plt.scatter, "total_bill", "tip", color="#334488", edgecolor="white", lw=.5);
g.set_axis_labels("Total bill (US Dollars)", "Tip");
g.set(xticks=[10, 30, 50], yticks=[2, 6, 10]);
g.fig.subplots_adjust(wspace=.02, hspace=.02);
For even more customization, you can work directly with the underling matplotlib ``Figure`` and ``Axes`` objects, which are stored as member attributes at ``fig`` and ``axes`` (a two-dimensional array), respectively.

In [ ]:
g = sns.FacetGrid(tips, col="smoker", margin_titles=True, size=4)
g.map(plt.scatter, "total_bill", "tip", color="#338844", edgecolor="white", s=50, lw=1)
for ax in g.axes.flat:
    ax.plot((0, 50), (0, .2 * 50), c=".2", ls="--")
g.set(xlim=(0, 60), ylim=(0, 14));
.. _custom_map_func: Mapping custom functions onto the grid ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You're not limited to existing matplotlib and seaborn functions when using :class:`FacetGrid`. However, to work properly, any function you use must follow a few rules: 1. It must plot onto the "currently active" matplotlib ``Axes``. This will be true of functions in the ``matplotlib.pyplot`` namespace, and you can call ``plt.gca`` to get a reference to the current ``Axes`` if you want to work directly with its methods. 2. It must accept the data that it plots in positional arguments. Internally, :class:`FacetGrid` will pass a ``Series`` of data for each of the named positional arguments passed to :meth:`FacetGrid.map`. 3. It must be able to accept ``color`` and ``label`` keyword arguments, and, ideally, it will do something useful with them. In most cases, it's easiest to catch a generic dictionary of ``**kwargs`` and pass it along to the underlying plotting function. Let's look at minimal example of a function you can plot with. This function will just take a single vector of data for each facet:

In [ ]:
def quantile_plot(x, **kwargs):
    qntls, xr = stats.probplot(x, fit=False)
    plt.scatter(xr, qntls, **kwargs)
    
g = sns.FacetGrid(tips, col="sex", size=4)
g.map(quantile_plot, "total_bill");
If we want to make a bivariate plot, you should write the function so that it accepts the x-axis variable first and the y-axis variable second:

In [ ]:
def qqplot(x, y, **kwargs):
    _, xr = stats.probplot(x, fit=False)
    _, yr = stats.probplot(y, fit=False)
    plt.scatter(xr, yr, **kwargs)
    
g = sns.FacetGrid(tips, col="smoker", size=4)
g.map(qqplot, "total_bill", "tip");
Because ``plt.scatter`` accepts ``color`` and ``label`` keyword arguments and does the right thing with them, we can add a hue facet without any difficulty:

In [ ]:
g = sns.FacetGrid(tips, hue="time", col="sex", size=4)
g.map(qqplot, "total_bill", "tip")
g.add_legend();
This approach also lets us use additional aesthetics to distinguish the levels of the hue variable, along with keyword arguments that won't be depdendent on the faceting variables:

In [ ]:
g = sns.FacetGrid(tips, hue="time", col="sex", size=4,
                  hue_kws={"marker": ["s", "D"]})
g.map(qqplot, "total_bill", "tip", s=40, edgecolor="w")
g.add_legend();
Sometimes, though, you'll want to map a function that doesn't work the way you expect with the ``color`` and ``label`` keyword arguments. In this case, you'll want to explictly catch them and handle them in the logic of your custom function. For example, this approach will allow use to map ``plt.hexbin``, which otherwise does not play well with the :class:`FacetGrid` API:

In [ ]:
def hexbin(x, y, color, **kwargs):
    cmap = sns.light_palette(color, as_cmap=True)
    plt.hexbin(x, y, gridsize=15, cmap=cmap, **kwargs)

g = sns.FacetGrid(tips, hue="time", col="time", size=4)
g.map(hexbin, "total_bill", "tip", extent=[0, 50, 0, 10]);
.. _pair_grid: Plotting pairwise relationships with :class:`PairGrid` and :func:`pairplot` --------------------------------------------------------------------------- :class:`PairGrid` also allows you to quickly draw a grid of small subplots using the same plot type to visualize data in each. In a :class:`PairGrid`, each row and column is assigned to a different variable, so the resulting plot shows each pairwise relationship in the dataset. This style of plot is sometimes called a "scatterplot matrix", as this is the most common way to show each relationship, but :class:`PairGrid` is not limited to scatterplots. It's important to understand the differences between a :class:`FacetGrid` and a :class:`PairGrid`. In the former, each facet shows the same relationship conditioned on different levels of other variables. In the latter, each plot shows a different relationship (although the upper and lower triangles will have mirrored plots). Using :class:`PairGrid` can give you a very quick, very high-level summary of interesting relationships in your dataset. The basic usage of the class is very similar to :class:`FacetGrid`. First you initialize the grid, then you pass plotting function to a ``map`` method and it will be called on each subplot. There is also a companion function, :func:`pairplot` (see :ref:`below `), that trades off some flexibility for faster plotting.

In [ ]:
iris = sns.load_dataset("iris")
g = sns.PairGrid(iris)
g.map(plt.scatter)
It's possible to plot a different function on the diagonal to show the univariate distribution of the variable in each column. Note that the axis ticks won't correspond to the count or density axis of this plot, though.

In [ ]:
g = sns.PairGrid(iris)
g.map_diag(plt.hist)
g.map_offdiag(plt.scatter)
A very common way to use this plot colors the observations by a separate categorical variable. For example, the iris dataset has four measurements for each of three different species of iris flowers so you can see how they differ.

In [ ]:
g = sns.PairGrid(iris, hue="species")
g.map_diag(plt.hist)
g.map_offdiag(plt.scatter)
g.add_legend()
By default every numeric column in the dataset is used, but you can focus on particular relationships if you want.

In [ ]:
g = sns.PairGrid(iris, vars=["sepal_length", "sepal_width"], hue="species")
g.map(plt.scatter)
It's also possible to use a different function in the upper and lower triangles to emphasize different aspects of the relationship.

In [ ]:
g = sns.PairGrid(iris)
g.map_upper(plt.scatter)
g.map_lower(sns.kdeplot, cmap="Blues_d")
g.map_diag(sns.kdeplot, lw=3, legend=False)
The square grid with identity relationships on the diagonal is actually just a special case, and you can plot with different variables in the rows and columns.

In [ ]:
g = sns.PairGrid(tips, y_vars=["tip"], x_vars=["total_bill", "size"], size=4)
g.map(sns.regplot, color=".3")
g.set(ylim=(-1, 11), yticks=[0, 5, 10]);
Of course, the aesthetic attributes are configurable. For instance, you can use a different palette (say, to show an ordering of the ``hue`` variable) and pass keyword arguments into the plotting functions.

In [ ]:
g = sns.PairGrid(tips, hue="size", palette="GnBu_d")
g.map(plt.scatter, s=50, edgecolor="white")
g.add_legend()
.. _pairplot: :class:`PairGrid` is flexible, but to take a quick look at a dataset, it can be easier to use :func:`pairplot`. This function uses scatterplots and histograms by default, although a few other kinds will be added (currently, you can also plot regression plots on the off-diagonals and KDEs on the diagonal).

In [ ]:
sns.pairplot(iris, hue="species", size=2.5);
You can also control the aesthetics of the plot with keyword arguments, and it returns the :class:`PairGrid` instance for further tweaking.

In [ ]:
g = sns.pairplot(iris, hue="species", palette="Set2", diag_kind="kde", size=2.5)
.. _joint_grid: Plotting bivariate data with :class:`JointGrid` and :func:`jointplot` --------------------------------------------------------------------- The :class:`JointGrid` can be used when you want to plot the relationship between or joint distribution of two variables along with the marginal distribution of each variable. :class:`JointGrid` is supplemented by the :func:`jointplot` function, which will likely suffice for many exploratory cases (see :ref:`below `). It can be helpful to know how to the :class:`JointGrid` class works, though, to have the best understanding of how to use these two tools. Like :class:`FacetGrid`, initializing the object sets up the axes but does not plot anything:

In [ ]:
g = sns.JointGrid("total_bill", "tip", tips)
The easiest way to use :class:`JointPlot` is to call :meth:`JointPlot.plot` with three arguments: a function to draw a bivariate plot, a function to draw a univariate plot, and a function to calculate a statistic that summarizes the relationship.

In [ ]:
g = sns.JointGrid("total_bill", "tip", tips)
g.plot(sns.regplot, sns.distplot, stats.pearsonr);
For more flexibility, you can use the separate methods :meth:`JointGrid.plot_joint`, :meth:`JointGrid.plot_marginals`, and :meth:`JointGrid.annotate`:

In [ ]:
g = sns.JointGrid("total_bill", "tip", tips)
g.plot_marginals(sns.distplot, kde=False, color=".5")
g.plot_joint(plt.scatter, color=".5", edgecolor="white")
g.annotate(stats.spearmanr, template="{stat} = {val:.3f} (p = {p:.3g})");
To control the presentation of the grid, use the ``size`` and ``ratio`` arguments. These control the size of the full figure (which is always square) and the ratio of the joint axes height to the marginal axes height:

In [ ]:
g = sns.JointGrid("total_bill", "tip", tips, size=4, ratio=30)
color = "#228833"
g.plot_marginals(sns.rugplot, color=color, alpha=.7, lw=1)
g.plot_joint(plt.scatter, color=color, alpha=.7, marker=".");
The ``space`` keyword argument controls the amount of padding between the axes with the joint plot and the two marginal axes:

In [ ]:
g = sns.JointGrid("total_bill", "tip", tips, space=0)
g.plot_marginals(sns.kdeplot, shade=True)
g.plot_joint(sns.kdeplot, shade=True, cmap="PuBu", n_levels=40);
.. _jointplot: The :func:`jointplot` function can draw a nice-looking plot with a single line of code:

In [ ]:
sns.jointplot("total_bill", "tip", tips);
It can draw several different kinds of plots, with good defaults chosen for each:

In [ ]:
sns.jointplot("total_bill", "tip", tips, kind="hex", color="#8855AA");
In many cases, :func:`jointplot` should be sufficient for exploratory graphics, but it may easier to use :class:`JointGrid` directly when you need more flexibility than is offered by the canned styles of :func:`jointplot`.