The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F.
In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.
Answer the following questions in this notebook below and submit to your Github account.
You can include written notes in notebook cells using Markdown:
In [1]:
import pandas as pd
In [2]:
%matplotlib inline
In [3]:
df = pd.read_csv('data/human_body_temperature.csv')
In [4]:
df.hist()
Out[4]:
In [5]:
df.describe()
Out[5]:
Sample Mean = 98.249231 Sample stddev = 0.733183 n = 130
Step 1:
- Null Hypothesis : Mean = 98.6
- Alternative Hypothesis : Mean != 98.6
Step 2:
- Point of estimate sample mean = 98.6
- Calculate Standard Error (SE)
Step 3:
- Check condition
-- Independence ==> True
-- If Sample is skewed then sample size > 30 ==> True
Step 4:
- Calculate z score and pvalue
Step 5:
- Based on p-value check if Null can be rejected.
In [16]:
import scipy.special
In [19]:
n = df.count()['temperature']
sigma = df['temperature'].std()
x_bar = df['temperature'].mean()
standard_error = sigma/((n)**(1.0/2))
z_score = ( x_bar - 98.6)/standard_error
p_values = 2*scipy.special.ndtr(z_score)
p_values
Out[19]:
Since p_value is much more less than 5%, null hypothesis can be rejected that the true population mean is 98.6 degrees Fahrenheit.
In [22]:
import scipy.stats as stats
In [23]:
stats.ttest_1samp(df.temperature,98.6)
Out[23]:
For t-test p value is little different from the pvalue from z-test but evidence is strong enough to reject null hypotesis.
==================================================================================================================
95% confidence interval can be considered good enough for this assesment. Margin of Error (M.E) = (critical value * standard error)
Critical value for confidence interval 95% = 1.96
Confidence interval = (Mean - Margin of Error, Mean + Margin of Error)
In [24]:
margin_of_error = 1.96*standard_error
In [25]:
confidence_interval = [x_bar - margin_of_error, x_bar + margin_of_error]
confidence_interval
Out[25]:
If temprature goes out of above range it might be considered as abnormal.
=======================================================================================================================
In [26]:
import numpy as np
In [27]:
female_temprature = np.array(df.temperature[df.gender=='F'])
len(female_temprature)
Out[27]:
In [28]:
male_temprature = np.array(df.temperature[df.gender=='M'])
len(male_temprature)
Out[28]:
Again sample size is large enough to test using z-test.
In [36]:
from statsmodels.stats.weightstats import ztest
tstat,p_val = ztest(female_temprature, male_temprature)
p_val_percent = p_val*100
if p_val_percent < 5:
print ("p-value is less then 5% so null hypothesis should be rejected.\n"
"There a significant difference between males and females in normal temperature ")
In [ ]:
In [ ]: