Exploration Example

Let's start with importing some plotting functions (don't care about the warning ... we should use something else, but this is just easier, for the time being).


In [1]:
%pylab inline


/usr/local/lib/python2.7/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))
Populating the interactive namespace from numpy and matplotlib

In [5]:
windows = [625, 480, 621, 633]
mac = [647, 503, 559, 586]

Now Lets try to calculate the mean ... you can just use mean()


In [ ]:

we can also plot the raw data


In [ ]:
figure()
plot(windows)
plot(mac,'r')

apply a t-test to check for significance


In [53]:
from scipy.stats import ttest_ind
from scipy.stats import ttest_rel
import scipy.stats as stats
#onesided t-test
ttest_ind(mac,windows)
#two sided t-test


Out[53]:
Ttest_indResult(statistic=-0.33810258358163742, pvalue=0.74680239857459685)

In [54]:
ttest_rel(mac,windows)


Out[54]:
Ttest_relResult(statistic=-0.71305042851453826, pvalue=0.52727545422603439)

let's say we get more data


In [41]:
more_win = [625, 480, 621, 633,694,599,505,527,651,505]
more_mac = [647, 503, 559, 586, 458, 380, 477, 409, 589,472]

what to do if we have more than 3 use an ANOVA in python stats.f_oneway()


In [51]:
more_bottom = [485,436, 512, 564, 560, 587, 391, 488, 555, 446]


Out[51]:
Ttest_relResult(statistic=-2.6758056901941498, pvalue=0.025379874652221083)

Anscombe's quartet

Let's take a look at some other data set (and actually import data from a file).


In [25]:
import pandas as pd
aq=pd.read_csv('data/anscombesQuartet.csv')

In [28]:
aq


Out[28]:
I_x I_y II_x II_y III_x III_y IV_x IV_y
0 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
1 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
2 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
3 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
4 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
5 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
7 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
8 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
9 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
10 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

In [29]:
mean(aq['I_y'])


Out[29]:
7.5009090909090927

again ... caluclate the means for all x.

calcuate the variance for x

variance for y


In [36]:



Out[36]:
Ttest_indResult(statistic=-0.33810258358163742, pvalue=0.74680239857459685)

In [35]:


In [38]:



Out[38]:
F_onewayResult(statistic=3.6909472287945335, pvalue=0.038278814395010151)

In [46]:


In [ ]: