Testing hypotheses about the mean

Last lecture I mentioned a so called t-test, also called Student's t-test. We used it to test for differences in the mean number of symbionts after a treatment (water warmer than usual) was applied to a sample of corals.

The question we had was, does the increase in water temperature have an effect on the number of symbionts per coral cell we observed?

For this, we said, we would like to have six different tanks with corals, three will be randomly selected and the water in those tanks will be warmed to a certain temperature. The remaining three tanks will serve as a control. At the end of the experiment we will have a number of corals that were treated and a number of corals that remained in the control tank, and to all of these corals we applied a method to isolate the symbionts and calculated the amount of symbionts per coral cell.

You have to remember that whenever we take a sample we assume this sample comes from a population and that the random variable we measure comes from a given distribution. If we assume the random variable comes from a normal distribution, we can simulate our experiment if we randomly sample from a normal distribution.

Lets say that our control corals have 26 symbionts per cell on average and that the standard deviation we observed was 6. Our treatment corals, in contrast, have only 14 symbionts per cell on average with a standard deviation also of 6. This is rare, but for the sake of the example we will keep the standard deviation equal.

To simulate sampling from these two normal distributions in R:



In [1]:

    
set.seed(123)
rnorm(3*3, 26, 6)









    





	22.6371461206867
	24.6189350631003
	35.3522498848947
	26.4230503485475
	26.7757264109657
	36.2903899212997
	28.7654972359352
	18.4096325923608
	21.8788828886388



In [2]:

    
set.seed(123)
rnorm(3*3, 26, 6)









    





	22.6371461206867
	24.6189350631003
	35.3522498848947
	26.4230503485475
	26.7757264109657
	36.2903899212997
	28.7654972359352
	18.4096325923608
	21.8788828886388

We just did our experiment!

Now we can try to test whether these two means are different. But lets create first a data.frame:



In [3]:

    
set.seed(123)
control<-rnorm(3*3,26,6)
treatment<-rnorm(3*3,22,6)

symbionts_per_coral_cell<-c(control,treatment)
treatment_column<-c(rep("C",3*3), rep("T",3*3))
coral_dataset<-data.frame(treatment_column, symbionts_per_coral_cell)

coral_dataset









    





treatment_column symbionts_per_coral_cell

	C       22.63715
	C       24.61894
	C       35.35225
	C       26.42305
	C       26.77573
	C       36.29039
	C       28.76550
	C       18.40963
	C       21.87888
	T       19.32603
	T       29.34449
	T       24.15888
	T       24.40463
	T       22.66410
	T       18.66495
	T       32.72148
	T       24.98710
	T       10.20030

Whenever you have some data you want to first visualize it, estimate the mean, the standard deviation, etc. So:



In [4]:

    
print("mean by treatment")
tapply(coral_dataset$symbionts_per_coral_cell, coral_dataset$treatment_column, mean)
print("sd by treatment")
tapply(coral_dataset$symbionts_per_coral_cell, coral_dataset$treatment_column, sd)









    



[1] "mean by treatment"






    





	C
		26.7946122740477
	T
		22.9413287630564








    



[1] "sd by treatment"






    





	C
		5.95730783233277
	T
		6.5022037485199



In [5]:

    
boxplot(coral_dataset$symbionts_per_coral_cell~coral_dataset$treatment_column)

Ok, so, we kind of see that the treatments are different. The purpose of hypothesis testing is precisely to make these kind of statements more objective. Hypothesis testing gives us a way to avoid arguing about whether we see kind of differences because it gives us a way to standardize what we will consider to be different.



In [6]:

    
t.test(coral_dataset$symbionts_per_coral_cell~coral_dataset$treatment_column)









    





	Welch Two Sample t-test

data:  coral_dataset$symbionts_per_coral_cell by coral_dataset$treatment_column
t = 1.3108, df = 15.879, p-value = 0.2086
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.382129 10.088696
sample estimates:
mean in group C mean in group T 
       26.79461        22.94133

treatment_column	symbionts_per_coral_cell
C	22.63715
C	24.61894
C	35.35225
C	26.42305
C	26.77573
C	36.29039
C	28.76550
C	18.40963
C	21.87888
T	19.32603
T	29.34449
T	24.15888
T	24.40463
T	22.66410
T	18.66495
T	32.72148
T	24.98710
T	10.20030