In [1]:
%matplotlib inline
from ggplot import *
In [2]:
ggplot(diamonds, aes(x='carat', y='price')) + geom_point() + ggtitle("Carat vs. Price")
Out[2]:
The plot above shows a scatterplot comparing a diamon's carat and the price of the diamond. The plot is composed of 3 layers:
ggplot(diamonds, aes(x='carat', y='price')) -- This defines the dataset that's going to be plotted and the aesthetics (or instructions) to be used for defining the x and y axes.geom_point() -- This layer tells ggplot to render a scatter plot using the aesthetics and data defined in the base layer.ggtitle("Carat vs. Price") -- This layer applies a title to the plot. There are lots of other labels and customizations you can do to your plots (xlab, ylab, etc.).You can continue to add more layers to your plot as there are more things you'd like to see. For instance, if I wanted to customize the x and y axis labels, I could do so by add 2 addition layers using xlab and ylab.
In [3]:
ggplot(diamonds, aes(x='carat', y='price')) + \
geom_point() + \
ggtitle("Carat vs. Price") + \
xlab(" Carat\n(1 carat = 200 mg)") + \
ylab(" Price\n(2008 USD)")
Out[3]:
In addition to adding labels you can also add additional "geoms", or plot types. For instance, let's add a linear trend-line to our plot using stat_smooth.
In [4]:
ggplot(diamonds, aes(x='carat', y='price')) + \
geom_point() + \
stat_smooth(method='lm') + \
ggtitle("Carat vs. Price") + \
xlab(" Carat\n(1 carat = 200 mg)") + \
ylab(" Price\n(2008 USD)")
Out[4]:
It looks like there are some outlying points in our plot. Let's filter out some of those rows in our dataset by using xlim and ylim. By adding these layers, it'll cap the x and y axes with whatever values we tell it to.
In [5]:
ggplot(diamonds, aes(x='carat', y='price')) + \
geom_point() + \
stat_smooth(method='lm') + \
ggtitle("Carat vs. Price") + \
xlab(" Carat\n(1 carat = 200 mg)") + \
ylab(" Price\n(2008 USD)") + \
xlim(0, 3) + \
ylim(0, 20000)
Out[5]:
Instead of building your ggplots with one big line of code, you can break them up into individual lines of code. To do this, use the + or += operators to gradually tack on layers to your plot.
In [6]:
p = ggplot(aes(x='mpg'), data=mtcars)
p += geom_histogram()
p += xlab("Miles per Gallon")
p += ylab("# of Cars")
p
Out[6]:
In [ ]: