In [1]:
%matplotlib inline
from ggplot import *

Distributions

ggplot provides 2 main ways to visualize distributions: histograms and density plots. Both are fairly easy to do, but it's not recommended that you use them at the same time. Reason being the scales of each are very different and can create confusion about your data as opposed to being helpful. So before you ask, no, there is not an easy (or at least sanctioned) way to create a histogram that's overlayed with a density plot.

Density Plots

stat_density and geom_density geoms can be applied to ggplot base objects to create density plots. They're actually the exact same thing, it's just a matter of preference as to whether you want to use the stat or geom version. Both use a gaussian kernel density estimator to estimate the probability density function that's used in the plot.


In [2]:
ggplot(diamonds, aes(x='price')) + geom_density()


Out[2]:
<ggplot: (284428701)>

In [3]:
ggplot(diamonds, aes(x='price')) + stat_density()


Out[3]:
<ggplot: (285372009)>

Just as you do can with other geoms, you can add different aesthetics to your plot in order to visualize multi-dimensional data.


In [4]:
ggplot(diamonds, aes(x='price', color='clarity')) + stat_density()


Out[4]:
<ggplot: (285311509)>

Careful, it's easy to get carried away


In [5]:
ggplot(diamonds, aes(x='price', color='clarity', linetype='cut')) + stat_density()


Out[5]:
<ggplot: (285662985)>

Histograms

Histograms are very similar in ggplot. Just define a base ggplot that has an x value and you're good to go.


In [6]:
ggplot(diamonds, aes(x='price')) + geom_histogram()


Out[6]:
<ggplot: (290561845)>

Again, just as you do can with other geoms, you can add different aesthetics to your plot in order to visualize multi-dimensional data.


In [7]:
ggplot(diamonds, aes(x='price', fill='clarity')) + geom_histogram()


Out[7]:
<ggplot: (291318073)>

In [ ]: