In [1]:
%matplotlib inline
from ggplot import *

Basics

To make a scatter plot in ggplot, the first thing you'll need is a dataset. ggplot likes pandas DataFrames the best, so it is highly recommended that you load your data into pandas before starting to use ggplot.

We'll use ggplot's built-in mtcars dataset for this set of examples.


In [2]:
mtcars.head()


Out[2]:
name mpg cyl disp hp drat wt qsec vs am gear carb
0 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
1 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
3 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2

Simple X/Y Plot

The simplest plot you can make is an x/y plot. To do this in ggplot, first create a base layer using the ggplot function. Pass in your data and aesthetics that map columns in your DataFrame to x and y. In this case, we're going to look at the relationship between car weight (wt) and miles per gallon (mpg), so we'll set the x value of our aesthetics to wt and the y value to mpg.

Once the aesthetics are defined, we just need to add a geom_point layer to our plot.


In [3]:
p = ggplot(mtcars, aes(x='wt', y='mpg'))
p + geom_point()


Out[3]:
<ggplot: (285176705)>

Controlling other aesthetics

In addition to the x and y variables, you can also control the shape, size, color, alpha (see throughness), and other "aesthetics" of your scatterplot. To do this, just include definitions for the aesthetic you'd like to control. For instance, let's set the color of each point in our graph to the acceleration (qsec) of each car.


In [4]:
p = ggplot(mtcars, aes(x='wt', y='mpg', color='qsec')) + geom_point()
print(p)


<ggplot: (285320953)>

Discrete Color

Since qsec is a continuous variable you probably noticed that you got that nice, graduated legend on the right side of the last plot that indicated the value of each color. If you were to use a discrete variable such as name for the color, watch what happens.

As you can see, each name is assigned its own color and displayed in the legend. This can get a bit unruly (especially if you're plotting a discrete value with hundreds of possible values) so be careful.


In [5]:
p = ggplot(mtcars, aes(x='wt', y='mpg', color='name')) + geom_point()
p


Out[5]:
<ggplot: (285460973)>

You can also use the ggplot shorthand factor to discretize continuous variables. For example, let's change cyl from a numerical to a categorical variable.


In [6]:
ggplot(mtcars, aes(x='wt', y='mpg', color='factor(cyl)')) + geom_point()


Out[6]:
<ggplot: (285133253)>

Pro Tip

Keep in mind that some aesthetics only work for continuous data, some only work for discrete data, and some work for both.

For more on which aesthetics are best in certain situations, read the aesthetics API documentation.


In [ ]: