Full post on Practical Business Python
In [1]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
Now that imports are done, enable inline display, set seaborn style and read in the file
In [2]:
%matplotlib inline
In [3]:
sns.set()
In [4]:
# File is in the data directory of the repo
df = pd.read_csv("https://raw.githubusercontent.com/chris1610/pbpython/master/data/MN_Traffic_Fatalities.csv")
In [5]:
df
Out[5]:
Create a basic scatter plot
In [6]:
sns.scatterplot(x='2016', y='Travel_Time', data=df)
Out[6]:
Try a different style based on the Pres_Election column
In [7]:
sns.scatterplot(x='2016', y='Travel_Time', style='Pres_Election', data=df)
Out[7]:
Size of the marks can be controlled via the size parameter
In [8]:
sns.scatterplot(x='2016', y='Travel_Time', size='Population', data=df)
Out[8]:
Show size and hue
In [9]:
sns.scatterplot(x='2016', y='Travel_Time', size='Population', hue='Twin_Cities', data=df)
Out[9]:
Need to create a tidy data frame in order to be most effective with the subsequent plots
In [10]:
df_melted = pd.melt(df, id_vars=['County', 'Twin_Cities', 'Pres_Election',
'Public_Transport(%)', 'Travel_Time', 'Population'],
value_vars=['2016', '2015', '2014', '2013', '2012'],
value_name='Fatalities',
var_name=['Year']
)
Here's what the data looks like for Hennepin County
In [11]:
df_melted[df_melted.County == "Hennepin"]
Out[11]:
Now that we have tidy data, let's plot some line plots
In [12]:
sns.lineplot(x='Year', y='Fatalities', data=df_melted, hue='Twin_Cities')
Out[12]:
Disable the confidence interval
In [13]:
sns.lineplot(x='Year', y='Fatalities', data=df_melted, hue='Twin_Cities', ci=False)
Out[13]:
relplot supports line and scatter plots on a FacetGrid. relplots provide more plotting options than the basic line or scatter plot
In [14]:
sns.relplot(x='2016', y='Travel_Time', size='Population', hue='Twin_Cities',
sizes=(100, 200), style='Pres_Election', data=df, legend='brief')
Out[14]:
In [15]:
# Here's how to review 2016 only data
df_melted.query("Year == '2016'")
Out[15]:
In [16]:
sns.relplot(x='Fatalities', y='Travel_Time', size='Population', hue='Twin_Cities',
sizes=(100, 200), data=df_melted.query("Year == '2016'"))
Out[16]:
We can split the data into two columns with the col keyword
In [17]:
sns.relplot(x='Fatalities', y='Travel_Time', size='Population', hue='Twin_Cities',
sizes=(100, 200), col='Pres_Election', data=df_melted.query("Year == '2016'"))
Out[17]:
We can use kind='line' to create line plots on the FacetGrid
In [18]:
sns.relplot(x='Year', y='Fatalities', data=df_melted, kind='line', hue='Twin_Cities', col='Pres_Election')
Out[18]:
This example has rows and columns
In [19]:
sns.relplot(x='Year', y='Fatalities', data=df_melted, kind='line', size='Population',
row='Twin_Cities', col='Pres_Election')
Out[19]:
factorplot is deprecated and replaced with catplot
In [20]:
sns.factorplot(x='Fatalities', y='County', data=df_melted)
Out[20]:
Here's how to use the category plot to replicate the factorplot
In [21]:
sns.catplot(x='Fatalities', y='County', data=df_melted, kind='point')
Out[21]:
Try a boxplot
In [22]:
sns.catplot(x='Year', y='Fatalities', kind='box', data=df_melted)
Out[22]:
A default catplot with two columns
In [23]:
sns.catplot(x='Year', y='Fatalities', data=df_melted, col='Twin_Cities')
Out[23]:
Change colors with hue
In [24]:
sns.catplot(y='County', x='Fatalities', data=df_melted, col='Twin_Cities', hue='Year')
Out[24]:
Use a catplot with the newly named boxen plot
In [25]:
sns.catplot(x='Year', y='Fatalities', data=df_melted, col='Twin_Cities', kind='boxen')
Out[25]:
In [26]:
# Here's a little easter egg
sns.dogplot()