```
In [1]:
```%matplotlib inline
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import pandas as pd

The `scipy.stats`

module has three functions for carrying out t-tests:

`ttest_1samp(a, popmean)`

-- carries out a one sample t-test, comparing the mean in`a`

to the given popmean.`ttest_ind(a,b)`

-- carries out a t-test for the mean of two independent samples`a`

and`b`

`ttest_rel(a,b)`

-- carries out a paired t-test for related samples`a`

and`b`

We'll start with a one-sample t-test. To illustrate this we'll use a data set involving bushtail possums that we used previously (see previous notebook).

Previous studies of brushtail possums in Australia have established that the mean tail length of adult possums is 37.86cm. I am studying an isolated population of possums in the state of Victoria, and I am interested in whether mean tail length of Victorian possums is shorter than that of possums in the rest of Australia.

- $H_0$: The average tail length of Victoria possums is the same as those in the rest of Australia (μ = 37.86)
- $H_A$: The average tail length of Victoria possums is less than the rest of Australia (μ < 37.86)

Since I have an a priori reason to believe the difference in tail length is shorter, this is a "one-tailed" hypothesis test.

```
In [4]:
```possums = pd.read_table("http://roybatty.org/possum.txt")
# rename the pop column because thats a pandas method name
possums.rename(columns={'pop':'popn'}, inplace=True)
# get the victoria possums
vic = possums[possums.popn == 'Vic']

```
In [12]:
```stats.ttest_1samp(vic.tailL, 37.86)

```
Out[12]:
```

`ttest_1samp`

```
In [15]:
```vicT = stats.ttest_1samp(vic.tailL, 37.86)
print("The z-score (t-score) for our test is: {:0.2f}".format(vicT.statistic))
print("The p-value for our test is: {:0.10f}".format(vicT.pvalue))

```
```

We'll use the book price example from your textbook (see section 5.2) to illustrate a paired t-test. The data is book prices from the UCLA bookstore and Amazon.com, for 73 text books used in classes at UCLA.

Our null and alternative hypotheses are:

$H_0$: the mean book price of textbooks at the UCLA bookstore and Amazon.com are the same

$H_A$: the mean books price of textbooks at the UCLA bookstore and Amazon.com are different

```
In [16]:
```books = pd.read_table("https://github.com/Bio204-class/bio204-datasets/raw/master/textbooks.txt")

```
In [17]:
```books.columns

```
Out[17]:
```

```
In [20]:
```books.head()

```
Out[20]:
```

```
In [21]:
```books.shape

```
Out[21]:
```

```
In [22]:
```booksT = stats.ttest_rel(books.uclaNew, books.amazNew)

```
In [23]:
``````
booksT
```

```
Out[23]:
```

To illustrate t-tests for independent samples we'll use the smoking and birthweight example from section 5.3 of your text book.

This data set includes 150 cases of mothers and their newborns in North Carolina. As per the textbook ((Diez et al. 2015), the null and alternative hypotheses we want to test are:

$H_0$: There is no difference in average birth weight for newborns from mothers who did and did not smoke. In statistical notation: $μ_n − μ_s = 0$, where $μ_n$ represents non-smoking mothers and $μ_s$ represents mothers who smoked.

$H_A$: There is some difference in average newborn weights from mothers who did and did not smoke ($μ_n$ − $μ_s$ $\neq$ 0).

```
In [24]:
```births = pd.read_table("https://github.com/Bio204-class/bio204-datasets/raw/master/births.txt")

```
In [25]:
```births.head()

```
Out[25]:
```

```
In [26]:
```births.shape

```
Out[26]:
```

`groupby`

and `describe`

methods to generate some useful summary statistics on baby weight, grouped-by whether the mother smoked or not.

```
In [35]:
```births.groupby('smoke').weight.describe()

```
Out[35]:
```

```
In [27]:
```# subset data based on smoke column
nonsmokers = births[births.smoke == 'nonsmoker']
smokers = births[births.smoke == "smoker"]

```
In [28]:
```birthT = stats.ttest_ind(nonsmokers.weight, smokers.weight)

```
In [29]:
``````
birthT
```

```
Out[29]:
```

```
In [ ]:
```