The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F.

In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions **in this notebook below and submit to your Github account**.

- Is the distribution of body temperatures normal?
- Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.

- Is the true population mean really 98.6 degrees F?
- Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?

- At what temperature should we consider someone's temperature to be "abnormal"?
- Start by computing the margin of error and confidence interval.

- Is there a significant difference between males and females in normal temperature?
- Set up and solve for a two sample hypothesis testing.

You can include written notes in notebook cells using Markdown:

- In the control panel at the top, choose Cell > Cell Type > Markdown
- Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

- Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
- Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

```
In [127]:
```%matplotlib inline
import pandas as pd
import numpy as np
import scipy.stats as st
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))

```
Out[127]:
```

```
In [15]:
```bodytemp_df = pd.read_csv('data/human_body_temperature.csv')
bodytemp_df

```
Out[15]:
```

```
In [19]:
```sns.distplot(bodytemp_df.temperature, bins = 25)

```
Out[19]:
```

```
In [116]:
```st.normaltest(df['temperature'])

```
Out[116]:
```

We see that our sample distribution does look like a normal distribution, albeit slightly left skewed. Nonetheless, we feel that it is reasonable to assume the CLT holds for this data. We see from our normaltest that the p-value returned is quite high, 25%. So we cannot reject the null hypothesis of this sample coming from a normal distribution. Thus both a visual inspection and a more rigorous computational one lets us conclude that the popluation is normally distributed in this case.

Now, we put forth the hypothesis that the true population mean is 98.6. To try and check this, we first require the sample mean, and sample standard deviation. Note that the pandas DataFrame.std method normalizes by N-1 by default.

```
In [32]:
```hyp_mean = 98.6
sample_meantemp = bodytemp_df['temperature'].mean()
sample_std = bodytemp_df['temperature'].std()
print('The sample mean is : ' , bodytemp_df['temperature'].mean(), ' degrees Farenheit')
print('The sample standard deviation is : ' , bodytemp_df['temperature'].std(), ' degrees Farenheit')

```
```

```
In [30]:
```sem_temp = sample_std/np.sqrt(len(bodytemp_df))
sem_temp

```
Out[30]:
```

```
In [104]:
```sample_std/np.sqrt(130)

```
Out[104]:
```

```
In [66]:
```z_score = (sample_meantemp - hyp_mean) / (sem_temp)
z_score

```
Out[66]:
```

```
In [68]:
```p_value=st.norm.cdf(z_score)
p_value

```
Out[68]:
```

```
In [81]:
```new_hyp = 98.2
z_score_new = (sample_meantemp - new_hyp)/ (sem_temp)
print(z_score_new)
p_value_new = 1-st.norm.cdf(z_score_new)
p_value_new

```
Out[81]:
```

```
In [93]:
```z_critical = st.norm.ppf(.975)
conf_int = z_critical*sem_temp
print ('margin of error: ', conf_int)
print('upper limit of normal: ', sample_meantemp + conf_int)
print('lower limit of normal: ', sample_meantemp - conf_int)

```
```

So if human body temperature is outside of the range given above, then we are reasonably sure that the temperature is abnormal, as our range should encompass 95% of the population.

Now we move on to testing if there is a significant differnce between males and females.

```
In [98]:
```female_df = bodytemp_df[bodytemp_df.gender == 'F'].copy()
male_df = bodytemp_df[bodytemp_df.gender == 'M'].copy()

```
In [101]:
```male_mean = male_df['temperature'].mean()
print('Male mean is: ', male_mean)
female_mean = female_df['temperature'].mean()
print('Female mean is: ', female_mean)
male_std = male_df['temperature'].std()
print('Male standard deviation is: ', male_std)
female_std = female_df['temperature'].std()
print('Female standard deviation is: ', female_std)

```
```

```
In [112]:
```difference_mean = female_mean - male_mean
print('Mean difference between two populations: ', difference_mean)
difference_sem = np.sqrt(male_std**2/len(male_df) + female_std**2/len(female_df))
print( 'Standard error of the mean: ', difference_sem)

```
```

```
In [131]:
```z_score_diff = (difference_mean - 0)/difference_sem
z_score_diff

```
Out[131]:
```

```
In [133]:
```p_value_diff = 1-st.norm.cdf(z_score_diff)
p_value_diff

```
Out[133]:
```

```
In [ ]:
```