Last lecture I mentioned a so called t-test, also called Student's t-test. We used it to test for differences in the mean number of symbionts after a treatment (water warmer than usual) was applied to a sample of corals.
The question we had was, does the increase in water temperature have an effect on the number of symbionts per coral cell we observed?
For this, we said, we would like to have six different tanks with corals, three will be randomly selected and the water in those tanks will be warmed to a certain temperature. The remaining three tanks will serve as a control. At the end of the experiment we will have a number of corals that were treated and a number of corals that remained in the control tank, and to all of these corals we applied a method to isolate the symbionts and calculated the amount of symbionts per coral cell.
You have to remember that whenever we take a sample we assume this sample comes from a population and that the random variable we measure comes from a given distribution. If we assume the random variable comes from a normal distribution, we can simulate our experiment if we randomly sample from a normal distribution.
Lets say that our control corals have 26 symbionts per cell on average and that the standard deviation we observed was 6. Our treatment corals, in contrast, have only 14 symbionts per cell on average with a standard deviation also of 6. This is rare, but for the sake of the example we will keep the standard deviation equal.
To simulate sampling from these two normal distributions in R:
In [1]:
set.seed(123)
rnorm(3*3, 26, 6)
In [2]:
set.seed(123)
rnorm(3*3, 26, 6)
We just did our experiment!
Now we can try to test whether these two means are different. But lets create first a data.frame:
In [3]:
set.seed(123)
control<-rnorm(3*3,26,6)
treatment<-rnorm(3*3,22,6)
symbionts_per_coral_cell<-c(control,treatment)
treatment_column<-c(rep("C",3*3), rep("T",3*3))
coral_dataset<-data.frame(treatment_column, symbionts_per_coral_cell)
coral_dataset
Whenever you have some data you want to first visualize it, estimate the mean, the standard deviation, etc. So:
In [4]:
print("mean by treatment")
tapply(coral_dataset$symbionts_per_coral_cell, coral_dataset$treatment_column, mean)
print("sd by treatment")
tapply(coral_dataset$symbionts_per_coral_cell, coral_dataset$treatment_column, sd)
In [5]:
boxplot(coral_dataset$symbionts_per_coral_cell~coral_dataset$treatment_column)
Ok, so, we kind of see that the treatments are different. The purpose of hypothesis testing is precisely to make these kind of statements more objective. Hypothesis testing gives us a way to avoid arguing about whether we see kind of differences because it gives us a way to standardize what we will consider to be different.
In [6]:
t.test(coral_dataset$symbionts_per_coral_cell~coral_dataset$treatment_column)