Plot a univariate distribution along the x axis:
In [ ]:
import seaborn as sns; sns.set(style="white")
penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins, x="flipper_length_mm")
Flip the plot by assigning the data variable to the y axis:
In [ ]:
sns.histplot(data=penguins, y="flipper_length_mm")
Check how well the histogram represents the data by specifying a different bin width:
In [ ]:
sns.histplot(data=penguins, x="flipper_length_mm", binwidth=3)
You can also define the total number of bins to use:
In [ ]:
sns.histplot(data=penguins, x="flipper_length_mm", bins=30)
Add a kernel density estimate to smooth the histogram, providing complementary information about the shape of the distribution:
In [ ]:
sns.histplot(data=penguins, x="flipper_length_mm", kde=True)
If neither x
nor y
is assigned, the dataset is treated as wide-form, and a histogram is drawn for each numeric column:
In [ ]:
iris = sns.load_dataset("iris")
sns.histplot(data=iris)
You can also draw multiple histograms from a long-form dataset with hue mapping:
In [ ]:
sns.histplot(data=penguins, x="flipper_length_mm", hue="species")
The default approach to plotting multiple distributions is to "layer" them, but you can also "stack" them:
In [ ]:
sns.histplot(data=penguins, x="flipper_length_mm", hue="species", multiple="stack")
Overlapping bars can be hard to visually resolve. A different approach would be to draw a step function:
In [ ]:
sns.histplot(penguins, x="flipper_length_mm", hue="species", element="step")
You can move even farther away from bars by drawing a polygon with vertices in the center of each bin. This may make it easier to see the shape of the distribution, but use with caution: it will be less obvious to your audience that they are looking at a histogram:
In [ ]:
sns.histplot(penguins, x="flipper_length_mm", hue="species", element="poly")
To compare the distribution of subsets that differ substantially in size, use indepdendent density normalization:
In [ ]:
sns.histplot(
penguins, x="culmen_length_mm", hue="island", element="step",
stat="density", common_norm=False,
)
It's also possible to normalize so that each bar's height shows a probability, which make more sense for discrete variables:
In [ ]:
tips = sns.load_dataset("tips")
sns.histplot(data=tips, x="size", stat="probability", discrete=True)
You can even draw a histogram over categorical variables (although this is an experimental feature):
In [ ]:
sns.histplot(data=tips, x="day", shrink=.8)
When using a hue
semantic with discrete data, it can make sense to "dodge" the levels:
In [ ]:
sns.histplot(data=tips, x="day", hue="sex", multiple="dodge", shrink=.8)
For heavily skewed distributions, it's better to define the bins in log space. Compare:
In [ ]:
planets = sns.load_dataset("planets")
sns.histplot(data=planets, x="distance")
To the log-scale version:
In [ ]:
sns.histplot(data=planets, x="distance", log_scale=True)
There are also a number of options for how the histogram appears. You can show unfilled bars:
In [ ]:
sns.histplot(data=planets, x="distance", log_scale=True, fill=False)
Or an unfilled step function:
In [ ]:
sns.histplot(data=planets, x="distance", log_scale=True, element="step", fill=False)
Step functions, esepcially when unfilled, make it easy to compare cumulative histograms:
In [ ]:
sns.histplot(
data=planets, x="distance", hue="method",
hue_order=["Radial Velocity", "Transit"],
log_scale=True, element="step", fill=False,
cumulative=True, stat="density", common_norm=False,
)
When both x
and y
are assigned, a bivariate histogram is computed and shown as a heatmap:
In [ ]:
sns.histplot(penguins, x="culmen_depth_mm", y="body_mass_g")
It's possible to assign a hue
variable too, although this will not work well if data from the different levels have substantial overlap:
In [ ]:
sns.histplot(penguins, x="culmen_depth_mm", y="body_mass_g", hue="species")
Multiple color maps can make sense when one of the variables is discrete:
In [ ]:
sns.histplot(
penguins, x="culmen_depth_mm", y="species", hue="species", legend=False
)
The bivariate histogram accepts all of the same options for computation as its univariate counterpart, using tuples to parametrize x
and y
independently:
In [ ]:
sns.histplot(
planets, x="year", y="distance",
bins=30, discrete=(True, False), log_scale=(False, True),
)
The default behavior makes cells with no observations transparent, although this can be disabled:
In [ ]:
sns.histplot(
planets, x="year", y="distance",
bins=30, discrete=(True, False), log_scale=(False, True),
thresh=None,
)
It's also possible to set the threshold and colormap saturation point in terms of the proportion of cumulative counts:
In [ ]:
sns.histplot(
planets, x="year", y="distance",
bins=30, discrete=(True, False), log_scale=(False, True),
pthresh=.05, pmax=.9,
)
To annotate the colormap, add a colorbar:
In [ ]:
sns.histplot(
planets, x="year", y="distance",
bins=30, discrete=(True, False), log_scale=(False, True),
cbar=True, cbar_kws=dict(shrink=.75),
)