In [2]:
%qtconsole

In [3]:
%matplotlib inline

About

This notebook presents plotting examples on the famous iris data set by using the Grammar of Graphics implemented in ggplot package. This package is good for those users coming from R, because of its design goal:

The goal is to have no difference other than those necessary due to the differences between R and Python.

via github.com/yhat/ggplot/

Include


In [4]:
from ggplot import *

import pandas as pd
from sklearn import datasets

Data


In [5]:
# import iris data
iris = datasets.load_iris()

df1 = pd.DataFrame(iris.data, columns = iris.feature_names)
df2 = pd.DataFrame(iris.target_names[iris.target])

df = pd.concat([df1, df2], axis = 1)
df.head()


Out[5]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 0
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

In [6]:
df.columns = ['sl', 'sw', 'pl', 'pw', 'species']

geom_point (scatter plot)

Using two continious variables sl and sw within the class labels in species variable, one can see whether there are class differences in the 2D space (via scatter plot).


In [10]:
p1 = ggplot(aes(x = 'sl', y = 'sw', color = 'species'), data = df) + geom_point()
p1


Out[10]:
<ggplot: (-895179300)>

The plot shows that setosa class can be linearly separated from other two classes.

geom_smooth


In [11]:
p2 = ggplot(aes(x = 'sl', y = 'sw', group = 'species', color = 'species'), data = df) + \
    geom_point() + geom_smooth(alpha = 0.5) + theme_bw()
p2


Out[11]:
<ggplot: (-894634072)>

geom_points with subsetting


In [36]:
p3 = ggplot(aes(x = 'sl', y = 'sw', color = 'species'), data = df[df.species != 'setosa']) + \
        geom_point() + theme_538()
p3


Out[36]:
<ggplot: (-895682774)>

geom_histogram with facetting


In [12]:
p3 = ggplot(aes(x = 'sl'), data = df) + geom_histogram() + facet_wrap('species', ncol = 1)
p3


Out[12]:
<ggplot: (-894684384)>

In [ ]: