.. _basic_tutorial: .. currentmodule:: seaborn

Basic plots to show numeric relationships



In [ ]:

    
%matplotlib inline



In [ ]:

    
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt



In [ ]:

    
import seaborn as sns
sns.set(style="darkgrid", color_codes=True)
np.random.seed(sum(map(ord, "basic")))

Emphasizing continuity with line plots -------------------------------------- With some datasets, you may want to understand changes in one variable as a function of another variable that has some sense of continuity. This is most common when one of the variables represents time, either abstractly or with a datetime object. In this situation, a good choice is to draw a line plot. In seaborn, this can be accomplished with the :func:`lineplot` function. The simplest case is when you have a vector of timepoints and a vector of values. To draw the function relating these variables, pass each to the ``x`` and ``y`` parameters of the :func:`lineplot` function, respectively:



In [ ]:

    
time = pd.date_range("2017-01-01", periods=24 * 31, freq="h")
value = np.random.randn(len(time)).cumsum()
sns.lineplot(x=time, y=value);

Because :func:`lineplot` assumes that you are most often trying to draw ``y`` as a function of ``x``, the default behavior is to sort the data by the ``x`` values before plotting. However, this can be disabled:



In [ ]:

    
x, y = np.random.randn(2, 1000).cumsum(axis=1)
sns.lineplot(x=x, y=y, sort=False);

Aggregation and representing uncertainty ---------------------------------------- More complex datasets will have multiple measurements for the same value of the ``x`` variable. The default behavior in seaborn is to aggregate the multiple measurements at each ``x`` value by plotting the mean and the 95% confidence interval around the mean:



In [ ]:

    
fmri = sns.load_dataset("fmri")
sns.lineplot(x="timepoint", y="signal", data=fmri);

The confidence intervals are computed using bootstrapping, which can be time-intensive for larger datasets. It's therefore possible to disable them:



In [ ]:

    
sns.lineplot(x="timepoint", y="signal", data=fmri, ci=None);

Another good option, especially with larger data, is to represent the spread of the distribution at each timepoint by plotting the standard deviation instead of a confidence interval:



In [ ]:

    
sns.lineplot(x="timepoint", y="signal", data=fmri, ci="sd");

To turn off aggregation altogether, set the ``estimator`` parameter to ``None``.



In [ ]:

    
sns.lineplot(x="timepoint", y="signal", data=fmri, estimator=None);

The ``estimator`` parameter can also be used to control what method is used to aggregate the data.

Plotting subsets of data with semantic mappings ----------------------------------------------- Often there will be multiple measurements at each value of ``x`` because we want to know how the relationship between ``x`` and ``y`` changes as a function of other variables. The :func:`lineplot` function allows you to define up to three additional variables that will be used to subset the data. These variables are then semantically mapped by the color (``hue``), width (``size``) and dashes/markers (``style``) used to draw the lines. For example, we can draw two line plots with different colors simply by defining a variable to be used for ``hue`` subsets:



In [ ]:

    
sns.lineplot(x="timepoint", y="signal", hue="event", data=fmri);

By adding a separate ``style`` variable, we can explore more complex relationships:



In [ ]:

    
sns.lineplot(x="timepoint", y="signal", hue="region", style="event", data=fmri);

Be cautious about making plots with multiple subset variables. While sometimes informative, they can also be very difficult to parse and interpret. However, even when you are only examining changes across one subset variable, it can be useful to alter both the color and style of the lines, which can make the plot more accessible when printed to black-and-white or viewed by someone with colorblindness:



In [ ]:

    
sns.lineplot(x="timepoint", y="signal", hue="event", style="event", data=fmri);

Some effort has been put into choosing good defaults so that you do not need to spend time specifying plot attributes for quick exploration, but the way the ``hue`` and ``style`` variables are mapped can be controlled through various parameters to the :func:`lineplot` function:



In [ ]:

    
sns.lineplot(x="timepoint", y="signal", hue="region", style="event",
             palette="Set2", hue_order=["frontal", "parietal"],
             dashes=["", (1, 1)],
             data=fmri);

By default, the ``style`` variable is represented by drawing lines with different dash patterns, but you can also draw markers with different shapes at the exact position of each observation:



In [ ]:

    
sns.lineplot(x="timepoint", y="signal", hue="region", style="event",
             palette="Set2", hue_order=["frontal", "parietal"],
             dashes=False, markers=True,
             data=fmri);

In the above examples, the ``hue`` variable takes different categorical values, and the colors of the lines are chosen with an appropriate qualitative colormap. When the ``hue`` variable is instead numeric (specifically, if it can be cast to float), the default behavior is to use a sequential colormap and to make a legend with "ticks" instead of an entry for each line (allowing it to scale to showing many lines):



In [ ]:

    
dots = sns.load_dataset("dots").query("align == 'dots'")
sns.lineplot(x="time", y="firing_rate",
             hue="coherence", style="choice",
             data=dots);

A non-default colormap can be selected in this case by passing a colormap name or object, and you can ask for a ``"full"`` legend:



In [ ]:

    
cmap = sns.cubehelix_palette(light=.7, as_cmap=True)
sns.lineplot(x="time", y="firing_rate",
             hue="coherence", style="choice",
             palette=cmap,
             legend="full", data=dots);

It may happen that, even though the ``hue`` variable is numeric, it is poorly represented by a linear color scale. That's the case here, where the levels of the ``hue`` variable are logarithmically scaled. You can provide specific color values for each line by passing a list or dictionary:



In [ ]:

    
palette = sns.cubehelix_palette(light=.7, n_colors=6)
sns.lineplot(x="time", y="firing_rate",
             hue="coherence", style="choice",
             palette=palette,
             data=dots);

Another option for semantically mapping a subset is to change the width of its lines, which is accomplished by defining the ``size`` parameter:



In [ ]:

    
sns.lineplot(x="time", y="firing_rate",
             size="coherence", style="choice",
             data=dots);

It's possible to control the range of line widths that are spanned by the data using the ``sizes`` parameter. Here we pass a ``(min, max)`` tuple, it it's also possible to pass a list of dictionary to precisely specify the width of each line:



In [ ]:

    
sns.lineplot(x="time", y="firing_rate",
             size="coherence", style="choice",
             sizes=(1, 2),
             data=dots);

While the ``size`` variable will typically be numeric, it's also possible to map a categorical variable with the width of the lines. Be cautious when doing so, because it will be very difficult to distinguish much more than "thick" vs "thin" lines. However, dashes can be hard to perceive when lines have considerable high-frequency variability, so using different widths may be helpful:



In [ ]:

    
palette = sns.cubehelix_palette(light=.7, n_colors=6)
sns.lineplot(x="time", y="firing_rate",
             hue="coherence", size="choice",
             palette=palette,
             data=dots);

Plotting with "long" versus "wide" data --------------------------------------- Like many other functions, :func:`lineplot` is most flexible when it is provided with "long- form" (or "tidy") data, typically in the form of a DataFrame where each column represents a variable and each row represents an observation. However, data are not always naturally generated or stored in long-form format, and it can be helpful to be able to take a quick look without reformatting. To support this, :func:`lineplot` can visualize a number of different "wide-form" representations if they are passed to ``data``. For instance, you can pass a wide DataFrame, which will draw a line for each column using the index for the ``x`` values:



In [ ]:

    
date = pd.date_range("2017-01-01", periods=365)
vals = np.random.randn(365, 4).cumsum(axis=0)
wide_df = pd.DataFrame(vals, date, list("ABCD"))
sns.lineplot(data=wide_df);

Numpy arrays are handled similarly, except you lose the label information that you get from pandas:



In [ ]:

    
wide_array = np.asarray(wide_df)
sns.lineplot(data=wide_array);

You can even pass in a list of objects with heterogeneous indices:



In [ ]:

    
wide_list = [wide_df.loc[:"2017-09-01", "A"], wide_df.loc["2017-05-1":, "B"]]
sns.lineplot(data=wide_list);



In [ ]: