Let's take a closer look again on our PC versus Mac menu bar example. Here are the measurements for the first experiments (between-subject, randomized) (we first import some stuff to make plots look nicer etc. press shift enter in the next cell to execute it).


In [3]:
%pylab inline
import matplotlib.pyplot as plt
#use a nicer plotting style
plt.style.use(u'fivethirtyeight')
print(plt.style.available)

#change figure size
pylab.rcParams['figure.figsize'] = (10, 6)


Populating the interactive namespace from numpy and matplotlib
[u'seaborn-darkgrid', u'seaborn-notebook', u'classic', u'seaborn-ticks', u'grayscale', u'bmh', u'seaborn-talk', u'dark_background', u'ggplot', u'fivethirtyeight', u'seaborn-colorblind', u'seaborn-deep', u'seaborn-whitegrid', u'seaborn-bright', u'seaborn-poster', u'seaborn-muted', u'seaborn-paper', u'seaborn-white', u'seaborn-pastel', u'seaborn-dark', u'seaborn-dark-palette']

Now Lets try to calculate the mean ... you can just use mean()


In [5]:
windows = [625, 480, 621, 633]
mac = [647, 503, 559, 586]

print std(windows)
print std(mac)


63.511317889
51.8139701239

hmmm ... there seems to be a difference, but it's not so big. Let's use point plots to check the data.


In [6]:
plot(windows,"*")
plot(mac,"o")


Out[6]:
[<matplotlib.lines.Line2D at 0x108d7aed0>]

Let's use boxplots to explore the windows and mac data we recorded so far. use the command boxplot to plot them. You can also combine the 2 datasets to place them in one plot using data = [windows,mac]


In [8]:
data = [windows,mac]
boxplot(data)
xticks([1,2],['windows','mac'])
#save the plot to a file
savefig("boxplot.pdf")


hmm ... doesn't look siginificant. yet, just to make sure, let's apply a t-test. Remember t-tests are for comparing only 2 means with eachother NOT more (also assumptions are that samples are independent, normally distributed and the variance is the same!)


In [9]:
from scipy.stats import ttest_ind
from scipy.stats import ttest_rel
import scipy.stats as stats
#onesided t-test
ttest_ind(mac,windows)


Out[9]:
Ttest_indResult(statistic=-0.33810258358163742, pvalue=0.74680239857459685)

In [10]:
#two sided t-test
ttest_rel(mac,windows)


Out[10]:
Ttest_relResult(statistic=-0.71305042851453826, pvalue=0.52727545422603439)

Ok... doesn't look significant. So we should record more data ;)


In [11]:
more_win = [625, 480, 621, 633,694,599,505,527,651,505]
more_mac = [647, 503, 559, 586, 458, 380, 477, 409, 589,472]

Ok ... let's caluclate the means and plot the data.


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:

now perform the t-test (two sided is best). What do you think?


In [ ]:


In [ ]:

what to do if we have more than 2 samples? (assumming we introduce a 3rd experimental setup where the menu bar is at the bottom of the screen) we need to use ANOVA (again assuming between-subject design, normal distributions etc.) Use the function stats.f_oneway


In [12]:
more_win = [625, 480, 621, 633,694,599,505,527,651,505]
more_mac = [647, 503, 559, 586, 458, 380, 477, 409, 589,472]
more_bottom = [485,436, 512, 564, 560, 587, 391, 488, 555, 446]

In [13]:
stats.f_oneway(more_win, more_mac, more_bottom)


Out[13]:
F_onewayResult(statistic=3.6909472287945335, pvalue=0.038278814395010151)

In [15]:
boxplot([more_win, more_mac, more_bottom])
xticks([1,2,3],['windows','mac', 'bottom'])


Out[15]:
([<matplotlib.axis.XTick at 0x10a716990>,
  <matplotlib.axis.XTick at 0x10bfe2ed0>,
  <matplotlib.axis.XTick at 0x10a7c2710>],
 <a list of 3 Text xticklabel objects>)

usually we don't define all data from the command prompt, but we read in files from disk.


In [16]:
import pandas as pd
menu_data=pd.read_csv("./data/menu_all.csv")

In [17]:
menu_data.describe()


Out[17]:
mac windows bottom
count 10.000000 10.000000 10.000000
mean 508.000000 584.000000 502.400000
std 85.332031 73.644793 64.668728
min 380.000000 480.000000 391.000000
25% 461.500000 510.500000 455.750000
50% 490.000000 610.000000 500.000000
75% 579.250000 631.000000 558.750000
max 647.000000 694.000000 587.000000

In [18]:
menu_data.boxplot()


/usr/local/lib/python2.7/site-packages/ipykernel/__main__.py:1: FutureWarning: 
The default value for 'return_type' will change to 'axes' in a future release.
 To use the future behavior now, set return_type='axes'.
 To keep the previous behavior and silence this warning, set return_type='dict'.
  if __name__ == '__main__':
Out[18]:
{'boxes': [<matplotlib.lines.Line2D at 0x10d07a210>,
  <matplotlib.lines.Line2D at 0x10d093a90>,
  <matplotlib.lines.Line2D at 0x10d0b96d0>],
 'caps': [<matplotlib.lines.Line2D at 0x10d087150>,
  <matplotlib.lines.Line2D at 0x10d087790>,
  <matplotlib.lines.Line2D at 0x10d0a1d50>,
  <matplotlib.lines.Line2D at 0x10d0ab3d0>,
  <matplotlib.lines.Line2D at 0x10d0c3990>,
  <matplotlib.lines.Line2D at 0x10d0c3fd0>],
 'fliers': [<matplotlib.lines.Line2D at 0x10d093450>,
  <matplotlib.lines.Line2D at 0x10d0b9090>,
  <matplotlib.lines.Line2D at 0x10d0d1c90>],
 'means': [],
 'medians': [<matplotlib.lines.Line2D at 0x10d087dd0>,
  <matplotlib.lines.Line2D at 0x10d0aba10>,
  <matplotlib.lines.Line2D at 0x10d0d1650>],
 'whiskers': [<matplotlib.lines.Line2D at 0x10d07a410>,
  <matplotlib.lines.Line2D at 0x10d07aad0>,
  <matplotlib.lines.Line2D at 0x10d0a10d0>,
  <matplotlib.lines.Line2D at 0x10d0a1710>,
  <matplotlib.lines.Line2D at 0x10d0b9cd0>,
  <matplotlib.lines.Line2D at 0x10d0c3350>]}

In [ ]: