Applied Linear Regression

The dataset below records the average weight of the brain and body for a number of mammal species.

There are 62 rows of data. The 3 data columns include:

  I,  the index,
  A1, the brain weight;
  B,  the body weight.

We seek a model of the form: B = A1 * X1.

In [1]:
import pandas as pd
import requests
%pylab inline'ggplot')

URL = ''

result = requests.get(URL)
data = [line.strip() for line in result.text.split('\n') if not line.startswith('#')]

Populating the interactive namespace from numpy and matplotlib

In [2]:

In [3]:
ncols = int(data[0].split(' ')[0])
nrows = int(data[1].split(' ')[0])

col_slice = slice(2, ncols + 2)
columns = data[col_slice]

In [4]:

['Index', 'Brain Weight', 'Body Weight']

In [5]:
row_slice = slice(ncols + 2, ncols + nrows+2)
rows = data[row_slice]

In [6]:

In [7]:
from io import StringIO
import re
csv_data = re.sub(r'[ ]+', ',', '\n'.join(rows))
data_df = pd.read_csv(StringIO(csv_data), header=None, names=columns)

Index Brain Weight Body Weight
0 1 3.385 44.5
1 2 0.480 15.5
2 3 1.350 8.1
3 4 465.000 423.0
4 5 36.330 119.5

In [9]:
data_df = data_df.rename(columns={'Body Weight': 'body_weight', 'Brain Weight': 'brain_weight'})
data_df.plot(kind='scatter', x='body_weight', y='brain_weight')

<matplotlib.axes._subplots.AxesSubplot at 0x27fa5d55e10>

In [11]:
import statsmodels.formula.api as smf
results = smf.ols('brain_weight ~ body_weight', data=data_df).fit()

OLS Regression Results
Dep. Variable: brain_weight R-squared: 0.873
Model: OLS Adj. R-squared: 0.871
Method: Least Squares F-statistic: 411.2
Date: Mon, 27 Mar 2017 Prob (F-statistic): 1.54e-28
Time: 23:29:16 Log-Likelihood: -445.27
No. Observations: 62 AIC: 894.5
Df Residuals: 60 BIC: 898.8
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept -56.8555 42.978 -1.323 0.191 -142.824 29.113
body_weight 0.9029 0.045 20.278 0.000 0.814 0.992
Omnibus: 35.627 Durbin-Watson: 2.548
Prob(Omnibus): 0.000 Jarque-Bera (JB): 784.333
Skew: -0.627 Prob(JB): 4.83e-171
Kurtosis: 20.379 Cond. No. 1.01e+03

In [13]:

Intercept     -56.855545
body_weight     0.902913
dtype: float64

In [17]:
fitted_model = data_df['body_weight'] * results.params['body_weight'] + results.params['Intercept']
predicted_df = pd.concat(
    {'predicted_brain_weight': fitted_model, 
     'actual_brain_weight': data_df['brain_weight'],
     'body_weight': data_df['body_weight'],
    }, axis=1)

In [22]:
ax = predicted_df.plot(kind='scatter', x='body_weight', y='actual_brain_weight')
predicted_df.plot(ax=ax, kind='scatter', x='body_weight', y='predicted_brain_weight', color='red')

<matplotlib.axes._subplots.AxesSubplot at 0x27fa6e4e128>

In [ ]: