Let's take a closer look again on our PC versus Mac menu bar example. Here are the measurements for the first experiments (between-subject, randomized) (we first import some stuff to make plots look nicer etc. press shift enter in the next cell to execute it).


In [60]:
%pylab inline
import matplotlib.pyplot as plt
#use a nicer plotting style
plt.style.use(u'fivethirtyeight')
print(plt.style.available)

#change figure size
pylab.rcParams['figure.figsize'] = (10, 6)


Populating the interactive namespace from numpy and matplotlib
[u'seaborn-darkgrid', u'seaborn-notebook', u'classic', u'seaborn-ticks', u'grayscale', u'bmh', u'seaborn-talk', u'dark_background', u'ggplot', u'fivethirtyeight', u'seaborn-colorblind', u'seaborn-deep', u'seaborn-whitegrid', u'seaborn-bright', u'seaborn-poster', u'seaborn-muted', u'seaborn-paper', u'seaborn-white', u'seaborn-pastel', u'seaborn-dark', u'seaborn-dark-palette']

Now Lets try to calculate the mean ... you can just use mean()


In [41]:
windows = [625, 480, 621, 633]
mac = [647, 503, 559, 586]

print mean(windows)
print mean(mac)


589.75
573.75

hmmm ... there seems to be a difference, but it's not so big. Let's use point plots to check the data.


In [ ]:
plot(windows,"*")
plot(mac,"o")

Let's use boxplots to explore the windows and mac data we recorded so far. use the command boxplot to plot them. You can also combine the 2 datasets to place them in one plot using data = [windows,mac]


In [ ]:
data = [windows,mac]
boxplot(data)
xticks([1,2],['windows','mac'])
#save the plot to a file
savefig("boxplot.pdf")

hmm ... doesn't look siginificant. yet, just to make sure, let's apply a t-test. Remember t-tests are for comparing only 2 means with eachother NOT more (also assumptions are that samples are independent, normally distributed and the variance is the same!)


In [ ]:
from scipy.stats import ttest_ind
from scipy.stats import ttest_rel
import scipy.stats as stats
#onesided t-test
ttest_ind(mac,windows)

In [ ]:
#two sided t-test
ttest_rel(mac,windows)

Ok... doesn't look significant. So we should record more data ;)


In [25]:
more_win = [625, 480, 621, 633,694,599,505,527,651,505]
more_mac = [647, 503, 559, 586, 458, 380, 477, 409, 589,472]

Ok ... let's caluclate the means and plot the data.


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:

now perform the t-test (two sided is best). What do you think?


In [ ]:


In [ ]:

what to do if we have more than 2 samples? (assumming we introduce a 3rd experimental setup where the menu bar is at the bottom of the screen) we need to use ANOVA (again assuming between-subject design, normal distributions etc.) Use the function stats.f_oneway


In [46]:
more_win = [625, 480, 621, 633,694,599,505,527,651,505]
more_mac = [647, 503, 559, 586, 458, 380, 477, 409, 589,472]
more_bottom = [485,436, 512, 564, 560, 587, 391, 488, 555, 446]

In [ ]:


In [ ]:

usually we don't define all data from the command prompt, but we read in files from disk.


In [49]:
import pandas as pd
menu_data=pd.read_csv("./data/menu_all.csv")

In [ ]:
menu_data.describe()

In [ ]:
menu_data.boxplot()