Seaborn Release 0.9 Highlights

Full post on Practical Business Python


In [1]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

Now that imports are done, enable inline display, set seaborn style and read in the file


In [2]:
%matplotlib inline

In [3]:
sns.set()

In [4]:
# File is in the data directory of the repo
df = pd.read_csv("https://raw.githubusercontent.com/chris1610/pbpython/master/data/MN_Traffic_Fatalities.csv")

In [5]:
df


Out[5]:
County Twin_Cities Pres_Election Public_Transport(%) Travel_Time Population 2012 2013 2014 2015 2016
0 Hennepin Yes Clinton 7.2 23.2 1237604 33 42 34 33 45
1 Dakota Yes Clinton 3.3 24.0 418432 19 19 10 11 28
2 Anoka Yes Trump 3.4 28.2 348652 25 12 16 11 20
3 St. Louis No Clinton 2.4 19.5 199744 11 19 8 16 19
4 Ramsey Yes Clinton 6.4 23.6 540653 19 12 12 18 15
5 Washington Yes Clinton 2.3 25.8 253128 8 10 8 12 13
6 Olmsted No Clinton 5.2 17.5 153039 2 12 8 14 12
7 Cass No Trump 0.9 23.3 28895 6 5 6 4 10
8 Pine No Trump 0.8 30.3 28879 14 7 4 9 10
9 Becker No Trump 0.5 22.7 33766 4 3 3 1 9

Create a basic scatter plot


In [6]:
sns.scatterplot(x='2016', y='Travel_Time', data=df)


Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f4542cb0828>

Try a different style based on the Pres_Election column


In [7]:
sns.scatterplot(x='2016', y='Travel_Time', style='Pres_Election', data=df)


Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f4543531128>

Size of the marks can be controlled via the size parameter


In [8]:
sns.scatterplot(x='2016', y='Travel_Time', size='Population', data=df)


Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f453a3b7048>

Show size and hue


In [9]:
sns.scatterplot(x='2016', y='Travel_Time', size='Population', hue='Twin_Cities', data=df)


Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f453a22a6d8>

Need to create a tidy data frame in order to be most effective with the subsequent plots


In [10]:
df_melted = pd.melt(df, id_vars=['County', 'Twin_Cities', 'Pres_Election', 
                               'Public_Transport(%)', 'Travel_Time', 'Population'], 
                  value_vars=['2016', '2015', '2014', '2013', '2012'], 
                  value_name='Fatalities',
                  var_name=['Year']
                 )

Here's what the data looks like for Hennepin County


In [11]:
df_melted[df_melted.County == "Hennepin"]


Out[11]:
County Twin_Cities Pres_Election Public_Transport(%) Travel_Time Population Year Fatalities
0 Hennepin Yes Clinton 7.2 23.2 1237604 2016 45
10 Hennepin Yes Clinton 7.2 23.2 1237604 2015 33
20 Hennepin Yes Clinton 7.2 23.2 1237604 2014 34
30 Hennepin Yes Clinton 7.2 23.2 1237604 2013 42
40 Hennepin Yes Clinton 7.2 23.2 1237604 2012 33

Now that we have tidy data, let's plot some line plots


In [12]:
sns.lineplot(x='Year', y='Fatalities', data=df_melted, hue='Twin_Cities')


Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f453a241cf8>

Disable the confidence interval


In [13]:
sns.lineplot(x='Year', y='Fatalities', data=df_melted, hue='Twin_Cities', ci=False)


Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f453a14e780>

relplot supports line and scatter plots on a FacetGrid. relplots provide more plotting options than the basic line or scatter plot


In [14]:
sns.relplot(x='2016', y='Travel_Time', size='Population', hue='Twin_Cities',
            sizes=(100, 200), style='Pres_Election', data=df, legend='brief')


Out[14]:
<seaborn.axisgrid.FacetGrid at 0x7f453a110eb8>

In [15]:
# Here's how to review 2016 only data
df_melted.query("Year == '2016'")


Out[15]:
County Twin_Cities Pres_Election Public_Transport(%) Travel_Time Population Year Fatalities
0 Hennepin Yes Clinton 7.2 23.2 1237604 2016 45
1 Dakota Yes Clinton 3.3 24.0 418432 2016 28
2 Anoka Yes Trump 3.4 28.2 348652 2016 20
3 St. Louis No Clinton 2.4 19.5 199744 2016 19
4 Ramsey Yes Clinton 6.4 23.6 540653 2016 15
5 Washington Yes Clinton 2.3 25.8 253128 2016 13
6 Olmsted No Clinton 5.2 17.5 153039 2016 12
7 Cass No Trump 0.9 23.3 28895 2016 10
8 Pine No Trump 0.8 30.3 28879 2016 10
9 Becker No Trump 0.5 22.7 33766 2016 9

In [16]:
sns.relplot(x='Fatalities', y='Travel_Time', size='Population', hue='Twin_Cities',
            sizes=(100, 200), data=df_melted.query("Year == '2016'"))


Out[16]:
<seaborn.axisgrid.FacetGrid at 0x7f4539fe17b8>

We can split the data into two columns with the col keyword


In [17]:
sns.relplot(x='Fatalities', y='Travel_Time', size='Population', hue='Twin_Cities',
            sizes=(100, 200), col='Pres_Election', data=df_melted.query("Year == '2016'"))


Out[17]:
<seaborn.axisgrid.FacetGrid at 0x7f4539fc1588>

We can use kind='line' to create line plots on the FacetGrid


In [18]:
sns.relplot(x='Year', y='Fatalities', data=df_melted, kind='line', hue='Twin_Cities', col='Pres_Election')


Out[18]:
<seaborn.axisgrid.FacetGrid at 0x7f4539f85208>

This example has rows and columns


In [19]:
sns.relplot(x='Year', y='Fatalities', data=df_melted, kind='line', size='Population',
            row='Twin_Cities', col='Pres_Election')


Out[19]:
<seaborn.axisgrid.FacetGrid at 0x7f4539dba6d8>

factorplot is deprecated and replaced with catplot


In [20]:
sns.factorplot(x='Fatalities', y='County', data=df_melted)


/home/chris/miniconda3/envs/pbp3/lib/python3.6/site-packages/seaborn/categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
  warnings.warn(msg)
Out[20]:
<seaborn.axisgrid.FacetGrid at 0x7f4539d0c6d8>

Here's how to use the category plot to replicate the factorplot


In [21]:
sns.catplot(x='Fatalities', y='County', data=df_melted, kind='point')


Out[21]:
<seaborn.axisgrid.FacetGrid at 0x7f45399c8c50>

Try a boxplot


In [22]:
sns.catplot(x='Year', y='Fatalities', kind='box', data=df_melted)


Out[22]:
<seaborn.axisgrid.FacetGrid at 0x7f453993b2b0>

A default catplot with two columns


In [23]:
sns.catplot(x='Year', y='Fatalities', data=df_melted, col='Twin_Cities')


Out[23]:
<seaborn.axisgrid.FacetGrid at 0x7f4539934668>

Change colors with hue


In [24]:
sns.catplot(y='County', x='Fatalities', data=df_melted, col='Twin_Cities', hue='Year')


Out[24]:
<seaborn.axisgrid.FacetGrid at 0x7f45384ff780>

Use a catplot with the newly named boxen plot


In [25]:
sns.catplot(x='Year', y='Fatalities', data=df_melted, col='Twin_Cities', kind='boxen')


Out[25]:
<seaborn.axisgrid.FacetGrid at 0x7f453849f438>

In [26]:
# Here's a little easter egg
sns.dogplot()