Usual stuff to import and set
In [1]:
import seaborn as sns
import pandas as pd
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
sns.set_context("notebook", font_scale=1.25)
sns.set_style("ticks", {"font.family": "Liberation Sans"})
matplotlib
pro, you probably know how to do everything that seaborn
does. Only the fact that matplotlib is just so insanely detailed and low level with myriad of options, it always is difficult to appreciate how powerful matplotlib
really isseaborn
, a Python
package for high level statistical plotting. Before coming across seaborn, I always went back to ggplot2
for my extensive plotting needs but no more R
or ggplot2
as I have found my safe havenHow am I planning to cover seaborn?
Again, this section is from API reference. I've listed subplots based on what I am going to cover today.
sns.pairplot(df)
Some parameters:
kind
: {'scatter', 'reg'}
diag_kind
: {'hist', 'kde'}
hue
: Select column to apply color by
In [2]:
iris = pd.read_csv("iris.csv", index_col=0)
iris.head()
Out[2]:
In [3]:
g = sns.pairplot(iris, hue="species", diag_kind="kde")
sns.jointplot(data=df, x="", y="")
Some parameters:
kind
: { “scatter” | “reg” | “resid” | “kde” | “hex” }
, optional
In [4]:
g = sns.jointplot(data=iris, x="sepal_length", y="sepal_width")
In [5]:
from scipy.stats import spearmanr
g = sns.jointplot(data=iris, x="sepal_length", y="sepal_width",
kind="kde", stat_func=spearmanr)
This is a grid plot to fit regression across subsets of dataset
sns.lmplot(data=df, x="", y="")
Some parameters:
col
- Separately plot forming different columns of plot based on this columnrow
- Separately plot forming different rows of plot based on this columnfit_reg
- Default True
, If you don't want regression, add False
here
In [6]:
sepal_lmplot = sns.lmplot(data=iris, fit_reg=False, x="sepal_length", y="sepal_width",
col="species", scatter_kws={'s': 25}, size=3.05).set_xticklabels(rotation=90)
In [7]:
petal_lmplot = sns.lmplot(data=iris, fit_reg=False, x="petal_length", y="petal_width",
col="species", scatter_kws={'s': 25}, size=3.05).set_xticklabels(rotation=90)
sns.factorplot(data=df, x="", y="")
Really powerful and versatile.
Some parameters:
kind : {point, bar, count, box, violin, strip}
In [8]:
planets = pd.read_csv("planets.csv", index_col=0)
planets.head()
Out[8]:
In [9]:
planets_factorplot = sns.factorplot(kind="strip", jitter=True,
data=planets[planets.method=="Radial Velocity"].sort_values(by="year"),
col="method", x="year", y="orbital_period", color=sns.xkcd_rgb["warm blue"],
size=5, aspect=1.8)
_ = planets_factorplot.set_xticklabels(rotation=90).set(ylim=0, yscale="log")
In [10]:
planets_factorplot = sns.factorplot(kind="box",
data=planets[planets.method=="Radial Velocity"].sort_values(by="year"),
col="method", x="year", y="orbital_period", color=sns.xkcd_rgb["green apple"],
size=5, aspect=1.8)
_ = planets_factorplot.set_xticklabels(rotation=90).set(ylim=0, yscale="log")
Heatmap with clustering. Damn easy and beautiful
Some parameters:
z_score
: optional, Calculate z_score if specified.{row,col}_cluster
: bool, optional. Whether to cluster by rows and columns. Enabled by default{row,col}_colors
: list-like, optional. Useful to add colors to columns or rows and check whether stuff is clustering according to condition and stuff
In [11]:
tpm_filtered = pd.read_table("Heatmap.tsv", sep="\t", index_col=0)
In [12]:
# Classic green and red colors for heatmap
cmap = sns.diverging_palette(133, 10, n=13, center="dark", as_cmap=True)
# Providing column colors. Awesome xkcd colors integration in seaborn
col_colors = [sns.xkcd_rgb["amber"]]*4 + [sns.xkcd_rgb["windows blue"]]*4
# Clustermap function
clustermap = sns.clustermap(tpm_filtered.sample(30), cmap=cmap, z_score=0,
figsize=(6,9), col_colors=col_colors)
# Just to set x axis ticks rotation to 0, by default it is 90
ax = clustermap.ax_heatmap
labels = ax.get_yticklabels()
_ = ax.set_yticklabels(labels, rotation=0)