Multi-Dimension Visualization


In [39]:
%matplotlib inline

import os 

import pandas as pd
import seaborn as sns 
import matplotlib as mpl
import matplotlib.pyplot as plt

# Setup context and style 
sns.set_context('talk')
sns.set_style('whitegrid')

In [32]:
IRIS = os.path.join("..", "data", "iris.csv")
data = pd.read_csv(IRIS)

Scatter Matrix

The scatter matrix allows users to identify correlations between pairs of dimensions in a matrix form.


In [33]:
sns.pairplot(data, hue='class', diag_kind="kde", size=3)


Out[33]:
<seaborn.axisgrid.PairGrid at 0x127947650>

Histograms

Once you want identify one or more dimensions that you would like to inspect, you can use histograms and kernel density estimates to get a sense for the variance in that field.


In [34]:
sns.distplot(data['sepal width'], rug=True)


Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x1294b84d0>

Joint Plots

The more fields in the scatter plot, the more difficult it is to identify what is going on. You can use joint plots to insepct the relationship and correlation between the two fields.


In [35]:
sns.jointplot("petal length", "petal width", data=data, kind='reg', size=12)


Out[35]:
<seaborn.axisgrid.JointGrid at 0x1272ba690>

RadViz

Once you move into attempting to visualize more than three dimensions, things get a bit tricky. The radviz plot attempts to create clusters of points by pulling them towards an outer ring.


In [40]:
from pandas.tools.plotting import radviz

plt.figure(figsize=(14,14))
mpl.rcParams.update({'font.size': 22})

radviz(data, 'class', color=sns.color_palette())


Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x124e21650>

Parallel Coordinates

Parallel coordinates intend to do the same thing as the radviz, but instead of having a circle with the dimensions, extend those dimensions out along the horizontal access.


In [42]:
from pandas.tools.plotting import parallel_coordinates

plt.figure(figsize=(14,14))
mpl.rcParams.update({'font.size': 22})

parallel_coordinates(data, 'class', color=sns.color_palette())


Out[42]:
<matplotlib.axes._subplots.AxesSubplot at 0x12bc67450>