.. _linear_categorical::
.. currentmodule:: seaborn
Linear models with categorical data
.. _factorplot:
Plotting categorical data with :func:`factorplot`
-------------------------------------------------
As with the quantitative functions :func:`lmplot` and :func:`regplot`, you can draw categorical plots with functions that operate at two different levels. In most cases, you'll want to use the :func:`factorplot` function. Like :func:`lmplot`, it plots onto a :class:`FacetGrid` and can visualize :ref:`a lot ` of data quickly. However, in some cases you may want a bit more control over the figure you're making, in which case you can use the lower-level functions :func:`pointplot` and :func:`barplot`. The :func:`factorplot` function is using these behind the scenes, and you can control which gets used with the ``kind`` parameter.
The API for the :func:`factorplot` function will be familiar by now. It draws data from a tidy DataFrame, and the positional arguments specify the names of variables that will be placed on the x and y axes of the plot. :func:`factorplot` also takes a third positional argument. It is named ``hue``, as it plays a similar role as the ``hue`` variable in :func:`lmplot`, plotting subsets of the data for easy direct comparison. However, in some cases the ``hue`` variable will also affect the location on the x axis where data is plotted. Because these functions are intended for use with *categorical* data, the x axis is not quantitatively represented. However, there will be cases where the x axis has a natural ordering.
The two main kinds of categorical plots show the same data, but with a different emphasis. ``point`` plots are better for comparing between conditions:
Whereas ``bar`` plots are better for understanding overall magnitude and how far it is from 0:
You can also plot a :func:`factorplot` with a boxplot representation (using ``kind="box"``). While the above plots focus on the central tendency of the data (with a measure of the error associated with that value), the boxplot should be used when you care about the *distribution* of the data in different categories.
When the ``kind`` is not specified, :func:`factorplot` uses a few heuristic rules to choose the appropriate kind of plot to draw. These are pretty rough, and may change over time, so it's better to specify.
Naturally, you can specify the palette to render the ``hue`` variable in. Any seaborn palette definition will work, and you can also pass a dictionary mapping values of the ``hue`` variable to colors.
Options for grouping the categories
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It's not necessary to use a ``hue`` variable:
And you can then use the palette to map the ``x`` variable:
In fact, you don't need to provide a ``y`` variable either. When ``y`` is missing, the height of the plot shows the *count* of observations in each category:
Estimators of central tendency and their error
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By default the height of the bars/points shows the mean and 95% confidence interval, but both can be changed.
Remember, the 68% confidence interval shows the standard error of the estimator:
Plotting on different facets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Remember, :func:`factorplot` is using a :class:`FacetGrid`, so all of the :ref:`options ` for structuring the plot into different subsets are available:
Choices in visual presentation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There are a few other choices for how the plot gets drawn when you use a ``point`` plot. Sometimes, the errorbars for different hue categories will overlap:
When this happens, you may want to "dodge" the different hue categories a bit so the extent of the overlap is more clear:
It's also not strictly necessary to join the points for each of the different ``hue`` levels:
The ``x`` and ``hue`` values are plotted in sorted order by default, but sometimes it makes more sense to provide a specific order:
Although by default the ``hue`` variable is only mapped to different colors (by the ``palette`` argument), you can also use different markers and linestyles for each level of the ``hue`` variable:
.. _barplot::
.. _pointplot::
Plotting with :func:`pointplot` and :func:`barplot`
---------------------------------------------------
As noted above, :func:`factorplot` is a combination of a :class:`FacetGrid` and a lower-level plotting function. If you want to built up a more complicated figure with different kind of presentations in different subplots, you can use the :func:`pointplot` and :func:`barplot` functions directly. They take all of the same arguments as :func:`factorplot`, aside from those that control the faceting.
Like :func:`regplot`, the lower-level categorical functions also accept their data directly in the form of a Series or array.