seaborn.lmplot


Seaborn's lmplot is a 2D scatterplot with an optional overlaid regression line. This is useful for comparing numeric variables. Logistic regression for binary classification is also supported with lmplot.


In [96]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
plt.rcParams['figure.figsize'] = (20.0, 10.0)
plt.rcParams['font.family'] = "serif"
np.random.seed(sum(map(ord,'lmplot')))

In [97]:
# Generate some random data for 2 imaginary classes of points
n = 256
sigma = 15
x = range(n)
y = range(n) + sigma*np.random.randn(n)
category1 = np.round(np.random.rand(n))
category2 = np.round(np.random.rand(n))
df = pd.DataFrame({'x':x,
                   'y':y,
                   'category1':category1,
                   'category2':category2})
df.loc[df.category1==1, 'y'] *= 2

Basic plot


In [98]:
sns.lmplot(data=df,
           x='x',
           y='y')


Out[98]:
<seaborn.axisgrid.FacetGrid at 0x7f42d674b4a8>

Color by species


In [99]:
sns.lmplot(data=df,
           x='x',
           y='y',
           hue='category1')


Out[99]:
<seaborn.axisgrid.FacetGrid at 0x7f42d66d12e8>

Facet the categorical variables using col and/or row


In [100]:
sns.lmplot(data=df,
           x='x',
           y='y',
           hue='category1',
           col='category1')


Out[100]:
<seaborn.axisgrid.FacetGrid at 0x7f42d6254160>

In [101]:
sns.lmplot(data=df,
           x='x',
           y='y',
           hue='category1',
           row='category2')


Out[101]:
<seaborn.axisgrid.FacetGrid at 0x7f42d61457f0>

Facet against two variables simultaneously


In [ ]:

Make a new variable to uniquely color the four different combinations


In [102]:
df['combined_category'] = df.category1.map(str) +  df.category2.map(str)
sns.lmplot(data=df,
           x='x',
           y='y',
           hue='combined_category',
           row='category1',
           col='category2')


Out[102]:
<seaborn.axisgrid.FacetGrid at 0x7f42d6050ef0>

Manually specify a maximum number of columns and let Seaborn automatically wrap with col_wrap.


In [103]:
sns.lmplot(data=df,
           x='x',
           y='y',
           hue='combined_category',
           col_wrap=3,
           col='combined_category')


Out[103]:
<seaborn.axisgrid.FacetGrid at 0x7f42d6094ef0>

Adjust height of facets with size


In [104]:
sns.lmplot(data=df,
           x='x',
           y='y',
           hue='combined_category',
           col_wrap=3,
           col='combined_category',
           size = 3)


Out[104]:
<seaborn.axisgrid.FacetGrid at 0x7f42d5ec5828>

Adjust aspect ratio


In [105]:
sns.lmplot(data=df,
           x='x',
           y='y',
           hue='combined_category',
           col_wrap=3,
           col='combined_category',
           size = 3,
           aspect=2)


Out[105]:
<seaborn.axisgrid.FacetGrid at 0x7f42d5bcf5c0>

Reuse x/y axis labels with sharex and sharey


In [106]:
sns.lmplot(data=df,
           x='x',
           y='y',
           hue='combined_category',
           row='category1',
           col='category2',
           sharex=True,
           sharey=True)


Out[106]:
<seaborn.axisgrid.FacetGrid at 0x7f42d5a4cac8>

Adjust the markers. A full list of options can be found here.


In [107]:
sns.lmplot(data=df,
           x='x',
           y='y',
           hue='category1',
           markers=['s','X'])


Out[107]:
<seaborn.axisgrid.FacetGrid at 0x7f42d5af7550>
Turn legend on/off with `legend`

In [108]:
sns.lmplot(data=df,
           x='x',
           y='y',
           hue='category1',
           markers=['s','X'],
           legend=False)


Out[108]:
<seaborn.axisgrid.FacetGrid at 0x7f42d72879b0>

Pull legend inside plot with legend_out=False


In [109]:
sns.lmplot(data=df,
           x='x',
           y='y',
           hue='category1',
           markers=['s','X'],
           legend_out=False)


Out[109]:
<seaborn.axisgrid.FacetGrid at 0x7f42d72e9a90>

In [ ]:

If there are multiple instances of each variable along x, you can provide a reduction function to `x_estimator` to visualize a summary statistic such as the mean.

In [110]:
# Generate some repeated values of x with different y
df2 = df
for i in range(3):
    copydata = df
    copydata.y += 100*np.random.rand(df.shape[0])
    df2 = pd.concat((df2, copydata),axis=0)
sns.lmplot(data=df2,
           x='x',
           y='y',
           aspect=2)


Out[110]:
<seaborn.axisgrid.FacetGrid at 0x7f42d7353780>

Provide a summary function to x_estimator


In [111]:
sns.lmplot(data=df2,
           x='x',
           y='y',
           x_estimator = np.mean,
           aspect=2)


Out[111]:
<seaborn.axisgrid.FacetGrid at 0x7f42d7325b38>

Reduce the size of the confidence intervals around the summarized values wiht x_ci, which is given as a percentage 0-100.


In [112]:
sns.lmplot(data=df2,
           x='x',
           y='y',
           x_estimator = np.mean,
           aspect=2,
           x_ci=50)


Out[112]:
<seaborn.axisgrid.FacetGrid at 0x7f42d56891d0>

Bin the data along x. The regression line is still fit to the full data


In [113]:
sns.lmplot(data=df2,
           x='x',
           y='y',
           aspect=2,
           x_bins=20)


Out[113]:
<seaborn.axisgrid.FacetGrid at 0x7f42d55e0dd8>

Disable plotting of scatterpoints with scatter=False


In [114]:
sns.lmplot(data=df2,
           x='x',
           y='y',
           aspect=2,
           scatter=False)


Out[114]:
<seaborn.axisgrid.FacetGrid at 0x7f42d7396470>

Disable plotting of regression line with fit_reg=False


In [115]:
sns.lmplot(data=df2,
           x='x',
           y='y',
           aspect=2,
           fit_reg=False)


Out[115]:
<seaborn.axisgrid.FacetGrid at 0x7f42d51d4198>

Adjust the size of the confidence interval drawn around the regression line similar to x_ci. Here I'll disable it by setting to None


In [116]:
sns.lmplot(data=df2,
           x='x',
           y='y',
           aspect=2,
           ci=None)


Out[116]:
<seaborn.axisgrid.FacetGrid at 0x7f42d5139d68>

Estimate a higher order polynomial, I just chose a value of 5 to demonstrate, but you should be careful choosing this parameter to avoid overfitting.


In [117]:
sns.lmplot(data=df2,
           x='x',
           y='y',
           aspect=2,
           order=5)


Out[117]:
<seaborn.axisgrid.FacetGrid at 0x7f42d50a5208>

In [118]:
sns.lmplot(data=df2,
           x='x',
           y='y',
           aspect=2,
           lowess=True)


Out[118]:
<seaborn.axisgrid.FacetGrid at 0x7f42d50d2d68>

Trim the regression line to match the bounds of the data with truncate


In [119]:
sns.lmplot(data=df2,
           x='x',
           y='y',
           aspect=2,
           truncate=True)


Out[119]:
<seaborn.axisgrid.FacetGrid at 0x7f42d503ba90>

In [ ]:

Perform logistic regression with logistic = True. This fits a line to the log-odds of a binary classification. I'll create a fake predictor to illustrate.


In [120]:
df2['feature1'] = 0.75*np.random.rand(df2.shape[0])
df2.loc[df2.category1 == 1, 'feature1'] = 0.25 + 0.75*np.random.rand(df2.loc[df2.category1 == 1].shape[0])
sns.lmplot(data=df2,
           x='feature1',
           y='category1',
           aspect=2,
           logistic=True)


Out[120]:
<seaborn.axisgrid.FacetGrid at 0x7f42d501d9e8>

Jitter can be added to make clusters of points easier to see with x_jitter and y_jitter. For this logistic regression all of the y points are either exactly 1 or 0, but the y_jitter adjusts the position where they are placed for visualization purposes.


In [121]:
sns.lmplot(data=df2,
           x='feature1',
           y='category1',
           aspect=2,
           logistic=True,
           y_jitter=.1)


Out[121]:
<seaborn.axisgrid.FacetGrid at 0x7f42d4ff8dd8>

Finalize


In [264]:
sns.set(rc={"font.style":"normal",
            "axes.facecolor":(0.9, 0.9, 0.9),
            "figure.facecolor":'white',
            "grid.color":'black',
            "grid.linestyle":':',
            #"text.color":"black",
            #"xtick.color":"black",
            #"ytick.color":"black",
            #"axes.labelcolor":"black",
            "axes.grid":True,
            'axes.labelsize':30,
            'figure.figsize':(20.0, 10.0),
            'xtick.labelsize':25,
            'ytick.labelsize':20})
df.sort_values('combined_category',inplace=True)
p = sns.lmplot(data=df,
           x='x',
           y='y',
           hue='combined_category',
           col='category2',
           size=10,
           sharey=True,
           legend_out=False,
           truncate=True,
           markers=['^','p','+','d'],
           palette=['#4daf4a','#1f78b4','#e41a1c','#7570b3'],
               hue_order = ['1.00.0', '0.00.0', '1.01.0','0.01.0'],
           scatter_kws={"s":200,'alpha':1},
           line_kws={"lw":4,
                     'ls':'--'})
leg = p.axes[0, 0].get_legend()
leg.set_title(None)
labs = leg.texts
labs[0].set_text("Type 0")
labs[1].set_text("Type 1")
labs[2].set_text("Type 2")
labs[3].set_text("Type 3")
for l in labs + [p.axes[0,0].xaxis.label, p.axes[0,0].yaxis.label, p.axes[0,1].xaxis.label, p.axes[0,1].yaxis.label]:
    l.set_fontsize(36)
p.axes[0, 0].set_title('')
p.axes[0, 1].set_title('')
plt.text(0,650, "Scatter Plot", fontsize = 95, color='black', fontstyle='italic')
p.axes[0,0].set_xticks(np.arange(0, 250, 100))
p.axes[0,0].set_yticks(np.arange(0, 700, 200))
p.axes[0,1].set_xticks(np.arange(0, 250, 100))
p.axes[0,1].set_yticks(np.arange(0, 700, 200))
#handles, labels = p.axes[0,0].get_legend_handles_labels()
#handles = [handles[2], handles[0], handles[1], handles[3]]
#labels = [labels[2], labels[0], labels[1], labels[3]]

#p.axes[0,0].legend(handles,labels,loc=2)


Out[264]:
[<matplotlib.axis.YTick at 0x7f42ceee9860>,
 <matplotlib.axis.YTick at 0x7f42ceee9940>,
 <matplotlib.axis.YTick at 0x7f42ceedc8d0>,
 <matplotlib.axis.YTick at 0x7f42cef2f128>]

In [260]:
p.get_figure().savefig('../../figures/lmplot.png')


Out[260]:
category1 category2 x y combined_category
255 0.0 0.0 255 375.339991 0.00.0
125 0.0 0.0 125 359.213052 0.00.0
77 0.0 0.0 77 226.531875 0.00.0
40 0.0 0.0 40 174.923714 0.00.0
39 0.0 0.0 39 287.347184 0.00.0