Errorbarjitter

Introduction

The purpose of this script is to provide modular code for generating errorbarjitter plots using python and pandas. The framework for this code was written in collaboration with Jason Wittenbach (@jwittenbach) of Janelia Research Campus. The idea is based on the matlab errorbarjitter code writtten by David Stern, also of Janelia Research Campus. This version uses free and open source software to achieve the same plot.

As in the original errorbarjitter, this function plots the mean ± SD of one or more samples alongside the raw data. The raw data are "jittered" and an alpha value adds transparency to aid in separation of individual data points. This form of data presentation invites active analysis of the raw data by the reader.

The following arguments can be passed to errorbarjitter:

df -- dataframe to be analyzed
groupByCol -- choose the column which contains the groups you would like to analyze by.
statsCol -- choose the metric on which you would like to calculate the mean and SD
fig -- can set fig = plt.fig() if you add additional information to the plot
xlab -- can pass a string argument in quotes for the x label
ylab -- can pass a string argument in quotes for the y label
rotate -- number of degrees by which to rotate the label on the x-axis, set to default of 0



In [1]:

    
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline



In [2]:

    
sns.set_style('darkgrid')
sns.set_context('talk')

errorbarjitter function



In [3]:

    
def errorbarjitter(df, groupByCol, statsCol, fig=None, xlab='group', ylab='units', rotate = 0):
    grouped = df.groupby([groupByCol])
    stats = grouped.aggregate({statsCol:[np.std, np.mean]})

    groups = df[groupByCol].unique()
    means, devs = stats[statsCol]['mean'], stats[statsCol]['std']
    
    plt.figure(figsize=(15,7))
    
    if fig is None:
        fig = plt.figure()
        
    for (i, (m, s)) in enumerate(zip(means, devs)):
        pts = np.array(df[df[groupByCol]==groups[i]][statsCol])
        x = i*np.ones(len(pts)) + 0.2*np.random.rand(len(pts))-0.1
        plt.scatter(x, pts, c='k', alpha=0.5)
        delta = 0.22
        plt.scatter(i+delta, m, edgecolor='k', facecolor='none', linewidth=3, s=25)
        plt.plot([i+delta, i+delta], [m-s, m+s], '-', c=[0, 0, 0], lw=2.0)
    plt.xticks(range(len(groups)), groups, rotation=rotate);
    plt.xlabel(xlab)
    plt.ylabel(ylab)

Example 1:

Here I created simple use case for the errorbarjitter function.

Consider a simple experiment in which 5 runners run a course 10 times. During each trial their elapsed time is recorded. The distribution of each runner's average time can be visualized using an errorbarjitter plot.



In [4]:

    
path = "ex-data.csv";
exdata = pd.read_csv(path)
exdata.head()



In [5]:

    
grouped = exdata.groupby(['runner'])
stats = grouped.aggregate({'time':[np.std, np.mean]})
stats



In [6]:

    
errorbarjitter(exdata, 'runner', 'time', xlab = 'runner', ylab = 'time (s)')
plt.title('5 runners average course time')









    Out[6]:





Text(0.5, 1.0, '5 runners average course time')






    





<Figure size 1080x504 with 0 Axes>

Iris dataset example

In this second example I use the iris dataset, obtained from Anderson[1].

It is simple to use errorbarjitter to plot the each of the metrics by speices as shown below.



In [7]:

    
path = "iris.csv";
data = pd.read_csv(path)
data.head()









    Out[7]:







  
    
      
      Unnamed: 0
      Sepal.Length
      Sepal.Width
      Petal.Length
      Petal.Width
      Species
    
  
  
    
      0
      1
      5.1
      3.5
      1.4
      0.2
      setosa
    
    
      1
      2
      4.9
      3.0
      1.4
      0.2
      setosa
    
    
      2
      3
      4.7
      3.2
      1.3
      0.2
      setosa
    
    
      3
      4
      4.6
      3.1
      1.5
      0.2
      setosa
    
    
      4
      5
      5.0
      3.6
      1.4
      0.2
      setosa



In [8]:

    
fig = plt.figure()
errorbarjitter(data, 'Species', 'Sepal.Length', xlab = 'species', ylab = 'sepal length')
plt.title('Sepal length by species')









    Out[8]:





Text(0.5, 1.0, 'Sepal length by species')






    





<Figure size 432x288 with 0 Axes>






    





<Figure size 1080x504 with 0 Axes>



In [9]:

    
fig = plt.figure()
errorbarjitter(data, 'Species', 'Sepal.Width', xlab = 'species', ylab = 'sepal width')
plt.title('Sepal width by species')









    Out[9]:





Text(0.5, 1.0, 'Sepal width by species')






    





<Figure size 432x288 with 0 Axes>






    





<Figure size 1080x504 with 0 Axes>



In [10]:

    
fig = plt.figure()
errorbarjitter(data, 'Species', 'Petal.Length', xlab = 'species', ylab = 'petal length')
plt.title('Petal length by species')









    Out[10]:





Text(0.5, 1.0, 'Petal length by species')






    





<Figure size 432x288 with 0 Axes>






    





<Figure size 1080x504 with 0 Axes>



In [11]:

    
fig = plt.figure()
errorbarjitter(data, 'Species', 'Petal.Width', xlab = 'species', ylab = 'petal width')
plt.title('Petal width by species')









    Out[11]:





Text(0.5, 1.0, 'Petal width by species')






    





<Figure size 432x288 with 0 Axes>






    





<Figure size 1080x504 with 0 Axes>

References:

[1] errorbarjitter. David Stern. http://www.mathworks.com/matlabcentral/fileexchange/33658-errorbarjitter

[2] The data were collected by Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2–5.

[3] Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, Part II, 179–188.

[4] Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Appendix:

Theme options with sns.set_style:

'darkgrid' 'whitegrid' 'dark' 'white' 'ticks'



In [12]:

    
sns.set_style('darkgrid')
fig = plt.figure()
errorbarjitter(data, 'Species', 'Petal.Width', xlab = 'species', ylab = 'petal width')
plt.title('Petal width by species')









    Out[12]:





Text(0.5, 1.0, 'Petal width by species')






    





<Figure size 432x288 with 0 Axes>






    





<Figure size 1080x504 with 0 Axes>



In [13]:

    
sns.set_style('whitegrid')
fig = plt.figure()
errorbarjitter(data, 'Species', 'Petal.Width', xlab = 'species', ylab = 'petal width')
plt.title('Petal width by species')









    Out[13]:





Text(0.5, 1.0, 'Petal width by species')






    





<Figure size 432x288 with 0 Axes>






    





<Figure size 1080x504 with 0 Axes>



In [14]:

    
sns.set_style('dark')
fig = plt.figure()
errorbarjitter(data, 'Species', 'Petal.Width', xlab = 'species', ylab = 'petal width')
plt.title('Petal width by species')









    Out[14]:





Text(0.5, 1.0, 'Petal width by species')






    





<Figure size 432x288 with 0 Axes>






    





<Figure size 1080x504 with 0 Axes>



In [15]:

    
sns.set_style('white')
fig = plt.figure()
errorbarjitter(data, 'Species', 'Petal.Width', xlab = 'species', ylab = 'petal width')
plt.title('Petal width by species')









    Out[15]:





Text(0.5, 1.0, 'Petal width by species')






    





<Figure size 432x288 with 0 Axes>






    





<Figure size 1080x504 with 0 Axes>



In [16]:

    
sns.set_style('ticks')
fig = plt.figure()
errorbarjitter(data, 'Species', 'Petal.Width', xlab = 'species', ylab = 'petal width')
plt.title('Petal width by species')









    Out[16]:





Text(0.5, 1.0, 'Petal width by species')






    





<Figure size 432x288 with 0 Axes>






    





<Figure size 1080x504 with 0 Axes>

Learning Index



In [17]:

    
path = "learning-data.csv";
learning = pd.read_csv(path)

sns.set_style('darkgrid')

fig = plt.figure()
errorbarjitter(learning, 'animal', 'pi', xlab = 'animal', ylab = 'performance index')
plt.title('learning assay')
plt.ylim(-1,1)









    Out[17]:





(-1, 1)






    





<Figure size 432x288 with 0 Axes>






    





<Figure size 1080x504 with 0 Axes>



In [ ]:



In [ ]:

	time
	std	mean
runner
1	1.475730	5.2
2	1.663330	10.1
3	3.027650	15.5
4	2.616189	21.2
5	3.373096	25.6

	Unnamed: 0	Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
0	1	5.1	3.5	1.4	0.2	setosa
1	2	4.9	3.0	1.4	0.2	setosa
2	3	4.7	3.2	1.3	0.2	setosa
3	4	4.6	3.1	1.5	0.2	setosa
4	5	5.0	3.6	1.4	0.2	setosa