Comparing more than two means

Up until now we only make an inference of one mean, inference of two means, but never more than that. This section want to tell how to make a comparison, inference-based for more than two means.

Screenshot taken from Coursera 00:56

We will use an example of 2010 GSS data where we compare the vocabulary test consist of numerical discrete to self-answered social class, category.The distribution of the samples is unimodal, and slightly left-skewed.For doing the analysis you can first plot a histogram for all the scores, you can do a bar plot to see the frequency of each social class, you can do side-by-side boxplot for category-numerical, and lastly, calculate summary statistics for each of the social class.

Screenshot taken from Coursera 04:42

If we're looking at the three boxplots here, 1 is most likely significant. Why? Because the IQR is small, and each of them not capturing other means with their IQR. so the means is significantly different in 1. The less likely is plot 2. It's almost no different with their means, and all of them can capture the means of others.

So in this context, we have a questions, "Is there a difference of vocabulary scores between social (self reported) classes?

  • To compare means with two groups, we use t or z statistics.
  • To compare means for more than two groups, we use analysis of variance (ANOVA) and use new statistics, F.

Anova

If earlier we use hypothesis test for more than two means, we set the hypothesis test.As always, null hypothesis is the skeptic one. There is no difference going on of means across groups

$$ \mu_\mathbf{1} = \mu_\mathbf{2} = ... = \mu_\mathbf{k}$$

Where $\mu_\mathbf{i}$ mean for outcome in category i,

k = number of groups

The alternative hypothesis is however at least one of them is different from the others. We don't how many of them are different, we don't know which of them is different, all we know is we're only have the probability of at least one of them is different. Recall that probability:

P(at least 1 | 5) = 1 - P(none)**5

Screenshot taken from Coursera 08:42

So, once again if z/t test calculating the difference whether sampling variability or not, in anova we test all of them produce p-value of sampling variability.

If we calculate the difference between two means for each of their respective groups, in anova we calculate the variability across groups, and within groups. Recall that HT is rejected if p-value smallls(contributed by large test statistics) and conclude evidence of difference in the data.

Screenshot taken from Coursera 09:38

F-distribution is right skewed, and we shaded the positive area. We're going to reject small p-values, which are contributed by large F statistics. In order to do that, mathematically, variability between groups must be larger than variability within groups.

Anova

So we can use anova when we want to detect a different of point estimate across groups. Perhaps students' final exam can be contributed by many factor, how many hours per week,quiz,midterm,project,assignment and so on and so forth. This have many category, and we want to ask which of the groups has variability that happen by other reasons.

Screenshot taken from Coursera 02:04

Recall in this data, our vocabulary score data with the social (self reported class). Also we are given summary statistics for each of the class. Null hypothesis is being skeptic one that there is no difference, and alternative at least one of them is different.The explanatory is social class, and the response is vocabulary score. This study is first divide the the data into categorical (social class), and then analyze the vocabulary score(numerical). We want to see whether the social class affect the score, and summary statistics show us that first separate by class and summary statistics taken from vocabulary score.

Screenshot taken from Coursera 03:17

After that we analyze the variability of explanatory variable. What we do is actually observe if the vocabulary score vary because of class or other factors. So after we separate by class, is the variability within groups, other than class, is larger than the variability by class? If indeed larger, then F statistics will be smaller. On the other hand, if variability between groups is larger than within group, then it will result into large F statistics.So F statistics will be large if variability of explanatory variables is the largest among other categories all combine into one.

Screenshot taken from Coursera 03:40

Sum Sq

Sum of Squared(SST)

Screenshot taken from Coursera 07:29

The value total (3106.36) is often called sum of squares total (SST). It's calculated by SST of response variables before explanatory separation (vocabulary scores in total). It's similar to variance, but it's not scaled by the sample size.Remember that variance is calculated by SST divided by sample size.

Recall that finally what we're trying to find is hypothesis testing based on the p-value. And p-value is get by calculating F statistics, which composed to the ratio between variability across groups to variability within groups.

SST not benefit if we only calculate this. We're going to step through all the value in ANOVA table.

Sum of Squared Groups(SSG)

Screenshot taken from Coursera 10:14

The value of 236.56 measures the variability between groups(explanatory variable). We're going to take mean of each of the group and its sample size, and create a deviation of group mean from overall mean, weighted by sample size.

SSG is similar question from SST, except that it weighted by sample size. Recall that SST measures the variability of total response variables, SSG measures the variability of mean of each explanatory groups. This will give us 236.56.The ratio is important though(not magnitude, don't pay attention to the large value, but how it ratio to the SST), and we can see that sample size will also comes into play (larger data could weight more to the grand mean).

Sum of Squares Error (SSE)

Screenshot taken from Coursera 12:23

Lastly, SSE is just the complements of the SSE and SST. What are the variability of the data other than explanatory variables. In other words, variability within groups. This could be other reason. And by looking at the ratio, it's large enough compared to SST. Which makes sense, as background in education, IQ, etc could be be stronger factor compared to social class for determining variability of vocabulary score.

Mean Sq

So after we get SSq, we also looking for a way to convert the total variability into average variability. And that will require scaling by a measure that incorporates sample size and number of groups, hence degree of freedom.

Screenshot taken from Coursera 14:02

The degree of freedom then will be easy to calculate. We already know sample size, we already know the number of groups, just plug in to the table. The mean squared error will be filled by sum of squares, scaled by degree of freedom.

Screenshot taken from Coursera 15:19

IMPORTANT. Before we calculate the residuals for both df and SSq by calculating the complements of total and class. For MeanSq, we're calculating it by averaging it's df and SSq.