What is the true normal human body temperature?

Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F.

Exercise

In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions in this notebook below and submit to your Github account.

Is the distribution of body temperatures normal?
- Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
Is the true population mean really 98.6 degrees F?
- Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?
At what temperature should we consider someone's temperature to be "abnormal"?
- Start by computing the margin of error and confidence interval.
Is there a significant difference between males and females in normal temperature?
- Set up and solve for a two sample hypothesis testing.

You can include written notes in notebook cells using Markdown:

In the control panel at the top, choose Cell > Cell Type > Markdown
Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

Resources

Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet



In [1]:

    
import pandas as pd



In [2]:

    
%matplotlib inline



In [3]:

    
df = pd.read_csv('data/human_body_temperature.csv')

Is the distribution of body temperatures normal?

Data is right skewed but it is nearly normal. We can apply CLT for hypothesis testing as population size is more than 30.



In [4]:

    
df.hist()









    Out[4]:





array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f6d511668d0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f6d50ec0748>]], dtype=object)



In [5]:

    
df.describe()









    Out[5]:






  
    
      
      temperature
      heart_rate
    
  
  
    
      count
      130.000000
      130.000000
    
    
      mean
      98.249231
      73.761538
    
    
      std
      0.733183
      7.062077
    
    
      min
      96.300000
      57.000000
    
    
      25%
      97.800000
      69.000000
    
    
      50%
      98.300000
      74.000000
    
    
      75%
      98.700000
      79.000000
    
    
      max
      100.800000
      89.000000

Is the true population mean really 98.6 degrees F?

Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?

For z-test Vs t-test:

We need to apply t-test if sample size is smaller than 30. Since sample size is more than 30 it is better to use z-test. Result will be almost the same as data is not extreme skewed and sample size is large enough.

Sample Mean = 98.249231 Sample stddev = 0.733183 n = 130

Sample Hypotesis Test:

Step 1:

- Null Hypothesis : Mean = 98.6
- Alternative Hypothesis : Mean != 98.6

Step 2:

- Point of estimate sample mean = 98.6 
- Calculate Standard Error (SE)

Step 3:

- Check condition
    -- Independence ==> True
    -- If Sample is skewed then sample size > 30 ==> True

Step 4:

- Calculate z score and pvalue

Step 5:

- Based on p-value check if Null can be rejected.



In [16]:

    
import scipy.special



In [19]:

    
n = df.count()['temperature']
sigma = df['temperature'].std()
x_bar = df['temperature'].mean()
standard_error = sigma/((n)**(1.0/2))
z_score = ( x_bar - 98.6)/standard_error
p_values = 2*scipy.special.ndtr(z_score)
p_values









    Out[19]:





4.9021570141133797e-08

Since p_value is much more less than 5%, null hypothesis can be rejected that the true population mean is 98.6 degrees Fahrenheit.

Testing for t-test



In [22]:

    
import scipy.stats as stats



In [23]:

    
stats.ttest_1samp(df.temperature,98.6)









    Out[23]:





(-5.4548232923645195, 2.4106320415561276e-07)

For t-test p value is little different from the pvalue from z-test but evidence is strong enough to reject null hypotesis.

==================================================================================================================

At what temperature should we consider someone's temperature to be "abnormal"?

95% confidence interval can be considered good enough for this assesment. Margin of Error (M.E) = (critical value * standard error)

Critical value for confidence interval 95% = 1.96

Confidence interval = (Mean - Margin of Error, Mean + Margin of Error)



In [24]:

    
margin_of_error = 1.96*standard_error



In [25]:

    
confidence_interval = [x_bar - margin_of_error, x_bar + margin_of_error]
confidence_interval









    Out[25]:





[98.123194112228518, 98.375267426233037]

If temprature goes out of above range it might be considered as abnormal.

=======================================================================================================================

Is there a significant difference between males and females in normal temperature?



In [26]:

    
import numpy as np



In [27]:

    
female_temprature = np.array(df.temperature[df.gender=='F'])
len(female_temprature)









    Out[27]:





65



In [28]:

    
male_temprature = np.array(df.temperature[df.gender=='M'])
len(male_temprature)









    Out[28]:





65

Again sample size is large enough to test using z-test.



In [36]:

    
from statsmodels.stats.weightstats import ztest
tstat,p_val = ztest(female_temprature, male_temprature)
p_val_percent = p_val*100
if p_val_percent < 5:
    print ("p-value is less then 5% so null hypothesis should be rejected.\n"
           "There a significant difference between males and females in normal temperature ")









    



p-value is less then 5% so null hypothesis should be rejected.
There a significant difference between males and females in normal temperature



In [ ]:



In [ ]:

	temperature	heart_rate
count	130.000000	130.000000
mean	98.249231	73.761538
std	0.733183	7.062077
min	96.300000	57.000000
25%	97.800000	69.000000
50%	98.300000	74.000000
75%	98.700000	79.000000
max	100.800000	89.000000