Exercise 01.2 - Solution

Using the given dataset. Estimate a linear regression between Employed and GNP.

$$Employed = b_0 + b_1 * GNP $$

Usando la base de datos del empleo y el GNP, estimar una regresion lineal del empleo versus el GNP.


In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


/home/al/anaconda3/lib/python3.5/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))

In [2]:
# Import data
raw_data = """
Year,Employed,GNP
1947,60.323,234.289
1948,61.122,259.426
1949,60.171,258.054
1950,61.187,284.599
1951,63.221,328.975
1952,63.639,346.999
1953,64.989,365.385
1954,63.761,363.112
1955,66.019,397.469
1956,67.857,419.18
1957,68.169,442.769
1958,66.513,444.546
1959,68.655,482.704
1960,69.564,502.601
1961,69.331,518.173
1962,70.551,554.894"""

data = []
for line in raw_data.splitlines()[2:]:
    words = line.split(',')
    data.append(words)
data = np.array(data, dtype=np.float)
n_obs = data.shape[0]

In [3]:
plt.plot(data[:, 2], data[:, 1], 'bo')
plt.xlabel("GNP")
plt.ylabel("Employed")


Out[3]:
<matplotlib.text.Text at 0x7f8831a39208>

In [4]:
X = np.c_[np.ones(n_obs), data[:, 2]]
Y = data[:, 1]

In [5]:
beta = np.dot(np.linalg.inv(np.dot(X.T, X)),np.dot(X.T, Y))

In [8]:
print(beta)


[  5.18435898e+01   3.47522943e-02]

In [7]:
# Bonus
x = np.linspace(200, 600)
plt.plot(x, beta[0] + beta[1]*x, 'b-')
plt.plot(data[:, 2], data[:, 1], 'bo')
plt.xlabel("GNP")
plt.ylabel("Employed")


Out[7]:
<matplotlib.text.Text at 0x7f882c17c128>