This is a demonstration of how to calculate the F-statistic for one-way ANOVA. Most of the content including the calculation chart is taken from https://onlinecourses.science.psu.edu/stat200/node/66.
In [1]:
import pandas as pd
import numpy as np
%pylab inline
pylab.style.use('ggplot')
One way ANOVA is a statistical method to compare the means of more than two independent groups. This procedure is known as a one-way analysis of variance, or more often as a "one-way ANOVA."
This is an improvement over the paired t-test because given n groups, we would have to conduct $\binom {n}{2}$ tests.
If a random variable X has an F-distribution with parameters d1 and d2, we write X ~ F(d1, d2). Then the probability density function (pdf) for X is given by
The figure below shows the shape of the PDF for different values of d1 and d2, with mean 0 and variance 1, respectively.
In [2]:
from scipy.stats import f
In [3]:
params = [(2, 4), (12, 12), (9, 9), (4, 6), (100, 100)]
x = np.linspace(0.25, 4, num=100)
data = []
for p in params:
d1, d2 = p
data.append(f.pdf(x, d1, d2))
data_df = pd.DataFrame(data).T
data_df.columns = ['d1={}, d2={}'.format(*p) for p in params]
data_df = data_df.assign(x=x)
In [4]:
data_df.set_index('x').plot()
Out[4]:
The assumptions for a one-way ANOVA are:
Given that we are comparing k independent groups, the null and alternative hypotheses are:
$H_0$: $\mu_1 = \mu_2 = \cdots = \mu_k$
$H_{alt}$: Not all $\mu_i$ are equal
In other words, the null hypothesis is that at all of the groups' population means are equal. The alternative is that they are not all equal; there are at two population means that are not equal to one another.
The test statistic is
$F = \dfrac {\text {between group variability}} {\text {within group variability}}$
Where,
k = Number of groups
n = Total sample size (all groups combined)
$n_k$ = Sample size of group k
$\bar {x_k}$ = Sample mean of group k
$\bar {x}$ = Grand mean (i.e., mean for all groups combined)
SS = Sum of squares
MS = Mean square
df = Degrees of freedom
F = F-ratio (the test statistic)
If the p value is less than the desired significance level, then we will reject the null hypothesis, i.e. the means are different.
In this example, we'll look at the well known Diet Dataset.
In [5]:
diet_url = 'https://www.sheffield.ac.uk/polopoly_fs/1.570199!/file/stcp-Rdataset-Diet.csv'
diet_df = pd.read_csv(diet_url)
In [6]:
diet_df.head()
Out[6]:
First we calculate the weight lost by each person under study.
In [7]:
diet_df = diet_df.assign(weight_lost=diet_df['pre.weight'] - diet_df.weight6weeks)
In [8]:
weight_lost = diet_df.loc[:, ['Diet', 'weight_lost']]
In [9]:
weight_lost.Diet.value_counts().plot(kind='bar')
Out[9]:
In [10]:
import seaborn as sns
fg = sns.FacetGrid(data=weight_lost, col='Diet')
fg = fg.map(pylab.hist, 'weight_lost')
In [11]:
weight_lost.boxplot(column='weight_lost', by='Diet')
Out[11]:
In [12]:
sns.factorplot(x='weight_lost', col='Diet', data=weight_lost)
Out[12]:
In [13]:
total_mean = weight_lost.weight_lost.mean()
total_std = weight_lost.weight_lost.std()
In [14]:
group_means = weight_lost.groupby(by='Diet')['weight_lost'].mean()
In [15]:
group_means.plot(kind='bar')
Out[15]:
In [16]:
group_counts = weight_lost.Diet.value_counts()
In [17]:
SS_between = group_means.sub(total_mean).pow(2.0).mul(group_counts).sum()
In [18]:
SS_between
Out[18]:
In [19]:
MS_between = SS_between / (len(group_counts) - 1)
In [20]:
MS_between
Out[20]:
In [21]:
groups = weight_lost.groupby('Diet')['weight_lost'].groups
SS_within = 0
for k, v in groups.items():
SS_in_group = weight_lost.loc[v, 'weight_lost'].sub(group_means[k]).pow(2.0).sum()
SS_within += SS_in_group
In [22]:
SS_within
Out[22]:
In [23]:
MS_within = SS_within / (len(weight_lost) - len(groups))
In [24]:
MS_within
Out[24]:
In [25]:
f_stat = MS_between / MS_within
f_stat
Out[25]:
In [26]:
n = len(weight_lost)
k = len(groups)
d1, d2 = k-1, n-k
In [27]:
d1, d2
Out[27]:
In [28]:
p_val = f.sf(f_stat, d1, d2)
In [29]:
p_val
Out[29]:
Set significance level = 0.05. Since the p_value is < 0.05, we reject the null hypothesis, i.e. we conclude that the mean of at least one of the diets are different than the other two.