What is the True Normal Human Body Temperature?

Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

Exercises

In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions in this notebook below and submit to your Github account.

  1. Is the distribution of body temperatures normal?
    • Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
  2. Is the sample size large? Are the observations independent?
    • Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
  3. Is the true population mean really 98.6 degrees F?
    • Would you use a one-sample or two-sample test? Why?
    • In this situation, is it appropriate to use the $t$ or $z$ statistic?
    • Now try using the other test. How is the result be different? Why?
  4. At what temperature should we consider someone's temperature to be "abnormal"?
    • Start by computing the margin of error and confidence interval.
  5. Is there a significant difference between males and females in normal temperature?
    • What test did you use and why?
    • Write a story with your conclusion in the context of the original problem.

You can include written notes in notebook cells using Markdown:

Resources


In [69]:
import pandas as pd

df = pd.read_csv('data/human_body_temperature.csv')
df.head()


Out[69]:
temperature gender heart_rate
0 99.3 F 68.0
1 98.4 F 81.0
2 97.8 M 73.0
3 99.2 F 66.0
4 98.0 F 73.0

(1) The histogram and normal probability plot shows that the distribution of body temperatures approximately follows a normal distribution


In [70]:
import numpy as np
import math
import pylab
import scipy.stats as stats
import matplotlib.pyplot as plt

plt.hist(df.temperature)
plt.show()

stats.probplot(df.temperature, dist="norm", plot=pylab)
pylab.show()


(2) The sample size is 130, which is large enough (>30) for the assumption of CLT. In addition, 130 people is <10% of the human population, so we can assume that the observations are independent.


In [71]:
sample_size = df.temperature.count()
print('sample size is ' + str(sample_size))


sample size is 130

(3) We can use one-sample z test (the sample size is much larger than 30):

$H_0: T = 98.6$
$H_A: T \neq 98.6$

The p value is 4.35e-08, which is much smaller than 0.05. This indicates that the true mean of the human body temperature is not 98.6.
When using t-test instead, the p value is 2.19e-07, which is larger than the p value obtained from z test due to the thicker tails of t-distribution. This p value is still much smaller than 0.05, indicating that the true mean of the human body temperature is not 98.6


In [72]:
mean = np.mean(df.temperature)
se = (np.std(df.temperature))/math.sqrt(sample_size)
z = (98.6 - mean)/se
p_z = (1-stats.norm.cdf(z))*2
print('p value for z test is ' + str(p_z))

dgf = sample_size - 1
p_t = 2*(1-stats.t.cdf(z, dgf))
print('p value for t test is ' + str(p_t))


p value for z test is 4.35231517493e-08
p value for t test is 2.18874646407e-07

(4) We would consider someone's temperature to be "abnormal" if it doesn't fall within the 95% confidence interval [98.12, 98.37]


In [73]:
ub = mean + 1.96*se
lb = mean - 1.96*se
print('Mean: ' + str(mean))
print('95 % Confidence Interval: [' + str(lb) + ', ' + str(ub) + ']')


Mean: 98.24923076923078
95 % Confidence Interval: [98.12367980442819, 98.37478173403336]

(5) We can use two-sample z test:

$H_0: T_M = T_F$
$H_A: T_M \neq T_F$

The p value is 0.02, which is smaller than 0.05. This indicates that there is a significant difference between males and females in normal temperature


In [74]:
male_temp = df[df.gender=='M'].temperature
female_temp = df[df.gender=='F'].temperature

mean_diff = abs(np.mean(male_temp) - np.mean(female_temp))
se = math.sqrt(np.var(male_temp)/male_temp.count() +  np.var(female_temp)/female_temp.count() )
z = mean_diff/se
p_z = (1-stats.norm.cdf(z))*2
print('mean for male is ' + str(np.mean(male_temp)))
print('mean for female is ' + str(np.mean(female_temp)))
print('p value for z test is ' + str(p_z))


mean for male is 98.1046153846154
mean for female is 98.39384615384613
p value for z test is 0.0212664518301

In summary, the human body temperature approximately follows a normal distribution. The temperature (mean = 98.25) measured in 1992 is significant different from that (mean=98.68) measured in 1868. In addition, there is a significant different in body temperature between males (mean=98.10) and females (98.39).