Let's take a closer look again on our PC versus Mac menu bar example. Here are the measurements for the first experiments (between-subject, randomized) (we first import some stuff to make plots look nicer etc. press shift enter in the next cell to execute it).
In [3]:
%pylab inline
import matplotlib.pyplot as plt
#use a nicer plotting style
plt.style.use(u'fivethirtyeight')
print(plt.style.available)
#change figure size
pylab.rcParams['figure.figsize'] = (10, 6)
Now Lets try to calculate the mean ... you can just use mean()
In [5]:
windows = [625, 480, 621, 633]
mac = [647, 503, 559, 586]
print std(windows)
print std(mac)
hmmm ... there seems to be a difference, but it's not so big. Let's use point plots to check the data.
In [6]:
plot(windows,"*")
plot(mac,"o")
Out[6]:
Let's use boxplots to explore the windows and mac data we recorded so far. use the command boxplot to plot them. You can also combine the 2 datasets to place them in one plot using data = [windows,mac]
In [8]:
data = [windows,mac]
boxplot(data)
xticks([1,2],['windows','mac'])
#save the plot to a file
savefig("boxplot.pdf")
hmm ... doesn't look siginificant. yet, just to make sure, let's apply a t-test. Remember t-tests are for comparing only 2 means with eachother NOT more (also assumptions are that samples are independent, normally distributed and the variance is the same!)
In [9]:
from scipy.stats import ttest_ind
from scipy.stats import ttest_rel
import scipy.stats as stats
#onesided t-test
ttest_ind(mac,windows)
Out[9]:
In [10]:
#two sided t-test
ttest_rel(mac,windows)
Out[10]:
Ok... doesn't look significant. So we should record more data ;)
In [11]:
more_win = [625, 480, 621, 633,694,599,505,527,651,505]
more_mac = [647, 503, 559, 586, 458, 380, 477, 409, 589,472]
Ok ... let's caluclate the means and plot the data.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
now perform the t-test (two sided is best). What do you think?
In [ ]:
In [ ]:
what to do if we have more than 2 samples? (assumming we introduce a 3rd experimental setup where the menu bar is at the bottom of the screen) we need to use ANOVA (again assuming between-subject design, normal distributions etc.) Use the function stats.f_oneway
In [12]:
more_win = [625, 480, 621, 633,694,599,505,527,651,505]
more_mac = [647, 503, 559, 586, 458, 380, 477, 409, 589,472]
more_bottom = [485,436, 512, 564, 560, 587, 391, 488, 555, 446]
In [13]:
stats.f_oneway(more_win, more_mac, more_bottom)
Out[13]:
In [15]:
boxplot([more_win, more_mac, more_bottom])
xticks([1,2,3],['windows','mac', 'bottom'])
Out[15]:
usually we don't define all data from the command prompt, but we read in files from disk.
In [16]:
import pandas as pd
menu_data=pd.read_csv("./data/menu_all.csv")
In [17]:
menu_data.describe()
Out[17]:
In [18]:
menu_data.boxplot()
Out[18]:
In [ ]: