seaborn.stripplot

A strip plot is a scatter plot where one of the variables is categorical. They can be combined with other plots to provide additional information. For example, a boxplot with an overlaid strip plot becomes more similar to a violin plot because some additional information about how the underlying data is distributed becomes visible. Seaborn's swarmplot is virtually identical except that it prevents datapoints from overlapping.

dataset: Kaggle: NBA shot logs



In [3]:

    
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
plt.rcParams['figure.figsize'] = (20.0, 10.0)
plt.rcParams['font.family'] = "serif"

This is a cool dataset that contains information about shot attempts made by professional basketball players.



In [4]:

    
df = pd.read_csv('../stripplot/shot_logs.csv',usecols=['player_name','SHOT_DIST','PTS_TYPE','SHOT_RESULT'])
players_to_use = ['kyrie irving', 'lebron james', 'stephen curry', 'jj redick']
df = df.loc[df.player_name.isin(players_to_use)]
df.head()









    Out[4]:







  
    
      
      SHOT_DIST
      PTS_TYPE
      SHOT_RESULT
      player_name
    
  
  
    
      14054
      8.0
      2
      missed
      stephen curry
    
    
      14055
      25.9
      3
      missed
      stephen curry
    
    
      14056
      23.8
      3
      made
      stephen curry
    
    
      14057
      27.5
      3
      made
      stephen curry
    
    
      14058
      29.3
      3
      missed
      stephen curry

Basic plot



In [5]:

    
p = sns.stripplot(data=df, x='player_name', y='SHOT_DIST')

Change the color to represent whether the shot was made or missed



In [6]:

    
p = sns.stripplot(data=df,
                  x='player_name',
                  y='SHOT_DIST',
                  hue='SHOT_RESULT')

Change the order in which the names are displayed



In [7]:

    
p = sns.stripplot(data=df,
                  x='player_name',
                  y='SHOT_DIST',
                  hue='SHOT_RESULT',
                  order=sorted(players_to_use))

jitter can be used to randomly provide displacements along the horizontal axis, which is useful when there are large clusters of datapoints



In [8]:

    
p = sns.stripplot(data=df,
                  x='player_name',
                  y='SHOT_DIST',
                  hue='SHOT_RESULT',
                  order=sorted(players_to_use),
                  jitter=0.25)

We see the default behavior is to stack the different hues on top of each other. This can be avoided with dodge (formerly called split)



In [9]:

    
p = sns.stripplot(data=df,
                  x='player_name',
                  y='SHOT_DIST',
                  hue='SHOT_RESULT',
                  order=sorted(players_to_use),
                  jitter=0.25,
                  dodge=True)

Flipping x and y inputs and setting orient to 'h' can be used to make a horizontal plot



In [10]:

    
p = sns.stripplot(data=df,
                  y='player_name',
                  x='SHOT_DIST',
                  hue='SHOT_RESULT',
                  order=sorted(players_to_use),
                  jitter=0.25,
                  dodge=False,
                  orient='h')

For coloring, you can either provide a single color to color...



In [11]:

    
p = sns.stripplot(data=df,
                  y='player_name',
                  x='SHOT_DIST',
                  hue='SHOT_RESULT',
                  order=sorted(players_to_use),
                  jitter=0.25,
                  dodge=True,
                  orient='h',
                  color=(.25,.5,.75))

...or you can use one of the many variations of the palette parameter



In [12]:

    
p = sns.stripplot(data=df,
                  x='player_name',
                  y='SHOT_DIST',
                  hue='SHOT_RESULT',
                  order=sorted(players_to_use),
                  jitter=0.25,
                  dodge=True,
                  palette=sns.husl_palette(2, l=0.5, s=.95))

Adjust the marker size



In [13]:

    
p = sns.stripplot(data=df,
                  x='player_name',
                  y='SHOT_DIST',
                  hue='SHOT_RESULT',
                  order=sorted(players_to_use),
                  jitter=0.25,
                  dodge=True,
                  palette=sns.husl_palette(2, l=0.5, s=.95),
                  size=8)

Adjust the linewidth of the edges of the circles



In [14]:

    
p = sns.stripplot(data=df,
                  x='player_name',
                  y='SHOT_DIST',
                  hue='SHOT_RESULT',
                  order=sorted(players_to_use),
                  jitter=0.25,
                  dodge=True,
                  palette=sns.husl_palette(2, l=0.5, s=.95),
                  size=8,
                  linewidth=3)

Change the color of these lines with edgecolor



In [15]:

    
p = sns.stripplot(data=df,
                  x='player_name',
                  y='SHOT_DIST',
                  hue='SHOT_RESULT',
                  order=sorted(players_to_use),
                  jitter=0.25,
                  dodge=True,
                  palette=sns.husl_palette(2, l=0.5, s=.95),
                  size=8,
                  linewidth=3,
                  edgecolor='blue')

Swarmplots look good when overlaid on top of another categorical plot, like boxplot



In [16]:

    
params = dict(data=df,
              x='player_name',
              y='SHOT_DIST',
              hue='SHOT_RESULT',
              #jitter=0.25,
              order=sorted(players_to_use),
              dodge=True)
p = sns.stripplot(size=8,
                  jitter=0.35,
                  palette=['#91bfdb','#fc8d59'],
                  edgecolor='black',
                  linewidth=1,
                  **params)
p_box = sns.boxplot(palette=['#BBBBBB','#DDDDDD'],linewidth=6,**params)

Finalize



In [17]:

    
plt.rcParams['font.size'] = 30
params = dict(data=df,
              x='player_name',
              y='SHOT_DIST',
              hue='SHOT_RESULT',
              #jitter=0.25,
              order=sorted(players_to_use),
              dodge=True)
p = sns.stripplot(size=8,
                  jitter=0.35,
                  palette=['#91bfdb','#fc8d59'],
                  edgecolor='black',
                  linewidth=1,
                  **params)
p_box = sns.boxplot(palette=['#BBBBBB','#DDDDDD'],linewidth=6,**params)
handles,labels = p.get_legend_handles_labels()
#for h in handles:
#    h.set_height(3)
#handles[2].set_linewidth(33)

plt.legend(handles[2:],
           labels[2:],
           bbox_to_anchor = (.3,.95),
           fontsize = 40,
           markerscale = 5,
           frameon=False,
           labelspacing=0.2)
plt.text(1.85,35, "Strip Plot", fontsize = 95, color='Black', fontstyle='italic')
plt.xlabel('')
plt.ylabel('Shot Distance (ft)')
plt.gca().set_xlim(-0.5,3.5)
xlabs = p.get_xticklabels()
xlabs[0].set_text('JJ Redick')
for l in xlabs[1:]:
    l.set_text(" ".join(i.capitalize() for i in l.get_text().split() ))
p.set_xticklabels(xlabs)









    Out[17]:





[<matplotlib.text.Text at 0x1164fceb8>,
 <matplotlib.text.Text at 0x113b96588>,
 <matplotlib.text.Text at 0x113abd4e0>,
 <matplotlib.text.Text at 0x113abde10>]



In [18]:

    
p.get_figure().savefig('../../figures/stripplot.png')

A fair bit of information is conveyed with a plot like this. JJ Redick is a shooting guard, and you see most of his shots are from a significant distances, whereas Lebron James has unsurprisingly a lot more attempts at close range. The median for Lebron's made shots is significantly lower than that for his misses, which is likely a result of him having many points from high percentage close shots/layups. There are a few outlying shots from very high distances, essentially all misses, that most likely are right before a buzzer.



In [ ]:

	SHOT_DIST	PTS_TYPE	SHOT_RESULT	player_name
14054	8.0	2	missed	stephen curry
14055	25.9	3	missed	stephen curry
14056	23.8	3	made	stephen curry
14057	27.5	3	made	stephen curry
14058	29.3	3	missed	stephen curry