Let's take a closer look again on our PC versus Mac menu bar example. Here are the measurements for the first experiments (between-subject, randomized) (we first import some stuff to make plots look nicer etc. press shift enter in the next cell to execute it).
In [60]:
%pylab inline
import matplotlib.pyplot as plt
#use a nicer plotting style
plt.style.use(u'fivethirtyeight')
print(plt.style.available)
#change figure size
pylab.rcParams['figure.figsize'] = (10, 6)
Now Lets try to calculate the mean ... you can just use mean()
In [41]:
windows = [625, 480, 621, 633]
mac = [647, 503, 559, 586]
print mean(windows)
print mean(mac)
hmmm ... there seems to be a difference, but it's not so big. Let's use point plots to check the data.
In [ ]:
plot(windows,"*")
plot(mac,"o")
Let's use boxplots to explore the windows and mac data we recorded so far. use the command boxplot to plot them. You can also combine the 2 datasets to place them in one plot using data = [windows,mac]
In [ ]:
data = [windows,mac]
boxplot(data)
xticks([1,2],['windows','mac'])
#save the plot to a file
savefig("boxplot.pdf")
hmm ... doesn't look siginificant. yet, just to make sure, let's apply a t-test. Remember t-tests are for comparing only 2 means with eachother NOT more (also assumptions are that samples are independent, normally distributed and the variance is the same!)
In [ ]:
from scipy.stats import ttest_ind
from scipy.stats import ttest_rel
import scipy.stats as stats
#onesided t-test
ttest_ind(mac,windows)
In [ ]:
#two sided t-test
ttest_rel(mac,windows)
Ok... doesn't look significant. So we should record more data ;)
In [25]:
more_win = [625, 480, 621, 633,694,599,505,527,651,505]
more_mac = [647, 503, 559, 586, 458, 380, 477, 409, 589,472]
Ok ... let's caluclate the means and plot the data.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
now perform the t-test (two sided is best). What do you think?
In [ ]:
In [ ]:
what to do if we have more than 2 samples? (assumming we introduce a 3rd experimental setup where the menu bar is at the bottom of the screen) we need to use ANOVA (again assuming between-subject design, normal distributions etc.) Use the function stats.f_oneway
In [46]:
more_win = [625, 480, 621, 633,694,599,505,527,651,505]
more_mac = [647, 503, 559, 586, 458, 380, 477, 409, 589,472]
more_bottom = [485,436, 512, 564, 560, 587, 391, 488, 555, 446]
In [ ]:
In [ ]:
usually we don't define all data from the command prompt, but we read in files from disk.
In [49]:
import pandas as pd
menu_data=pd.read_csv("./data/menu_all.csv")
In [ ]:
menu_data.describe()
In [ ]:
menu_data.boxplot()