In [1]:
%pylab inline
pylab.style.use('ggplot')
import numpy as np
import pandas as pd
Heart attacks in rabbits. When heart muscle is deprived of oxygen, the tissue dies and leads to a heart attack ("myocardial infarction"). Apparently, cooling the heart reduces the size of the heart attack. It is not known, however, whether cooling is only effective if it takes place before the blood flow to the heart becomes restricted. Some researchers (Hale, et al, 1997) hypothesized that cooling the heart would be effective in reducing the size of the heart attack even if it takes place after the blood flow becomes restricted.
To investigate their hypothesis, the researchers conducted an experiment on 32 anesthetized rabbits that were subjected to a heart attack. The researchers established three experimental groups:
Rabbits whose hearts were cooled to 6º C within 5 minutes of the blocked artery ("early cooling")
Rabbits whose hearts were cooled to 6º C within 25 minutes of the blocked artery ("late cooling")
Rabbits whose hearts were not cooled at all ("no cooling")
At the end of the experiment, the researchers measured the size of the infarcted (i.e., damaged) area (in grams) in each of the 32 rabbits. But, as you can imagine, there is great variability in the size of hearts. The size of a rabbit's infarcted area may be large only because it has a larger heart. Therefore, in order to adjust for differences in heart sizes, the researchers also measured the size of the region at risk for infarction (in grams) in each of the 32 rabbits.
In [21]:
import requests
url = 'https://onlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/coolhearts.txt'
response = requests.get(url)
response.status_code
Out[21]:
In [74]:
lines = [line.decode('ascii', 'ignore') for line in response.iter_lines()]
lines = [line.strip().replace('\x00', '').split('\t') for line in lines]
lines = [line for line in lines if len(line) > 1]
data = pd.DataFrame(data=lines[1:], columns=lines[0], dtype=np.float)
In [76]:
data.head()
Out[76]:
In [78]:
pd.unique(data.X2)
Out[78]:
In [79]:
pd.unique(data.X3)
Out[79]:
In [80]:
data.Inf.groupby(data.X2).agg({'min': np.min, 'max': np.max, 'mean': np.mean})
Out[80]:
In [81]:
data.Inf.groupby(data.X3).agg({'min': np.min, 'max': np.max, 'mean': np.mean})
Out[81]:
In [82]:
data.corr()
Out[82]:
In [83]:
data.plot(kind='scatter', x='Area', y='Inf')
Out[83]:
In [89]:
import statsmodels.formula.api as sm
data = data.rename(columns={'Inf': 'Infection'})
result = sm.ols(formula='Infection ~ Area', data=data).fit()
result.summary()
Out[89]:
In [92]:
result = sm.ols(formula='Infection ~ Area + X2 + Group', data=data).fit()
result.summary()
Out[92]:
In [94]:
result.resid.plot(kind='hist', bins=20)
Out[94]:
In [ ]: