Lab Session 2: Linear Regression

Neil D. Lawrence COM4509/6509 Machine Learning and Adaptive Intelligence

The aim of this lab is to have you coding linear regression in python. We will do it in two ways, once using iterative updates (coordinate ascent) and then using linear algebra. Firstly we will import the relevant libraries.


In [ ]:
import numpy as np
import matplotlib as plt

Next we want to make sure our plots appear inline rather than in separate windows, so we use the IPython magic command:


In [ ]:
%pylab inline

Remember, that to check what a command does simply type:


In [ ]:
np.random.randn?

Linear Regression

For this part we are going to load in some real data, we will use the example from the Olympics: the pace of Marathon winners. To load their data (which is in comma separated values [csv] format) we first need to download it from this link, and then load it as follows:


In [ ]:
olympics = np.genfromtxt('olympicMarathonTimes.csv', delimiter=',')

This loads the data into a numpy array. You can extract the years and pace of the winners, respectively into column vectors as follows:


In [ ]:
x = olympics[:, 0:1]
y = olympics[:, 1:2]

You can se what these values are by typing:


In [ ]:
print(x)
print(y)

Plotting the Data

And you can make a plot of $y$ vs $x$ with the following command:


In [ ]:
plt.plot(x, y, 'rx')

Maximum Likelihood: Iterative Solution

Now we are going to fit a line, $y_i=mx_i + c$, to the data you've plotted. We are trying to minimize the error function:

$$E(m, c, \sigma^2) = \frac{N}{2} \log \sigma^2 + \frac{1}{2\sigma^2} \sum_{i=1}^N(y_i-mx_i-c)^2$$

with respect to $m$, $c$ and $\sigma^2$. We can start with an initial guess for $m$,


In [ ]:
m = -0.4
c = 80

Then we use the maximum likelihood update, derived in this week's lecture, to find an estimate for the offset, $c$,

$$c^* = \frac{\sum_{i=1}^N(y_i-m^*x_i)}{N}$$

In [ ]:
c = (y - m*x).mean()
print c

And now we can make an estimate for the gradient of the line,

$$m^* = \frac{\sum_{i=1}^N ((y_i - c)*x_i))}{\sum_{i=1}^N x_i^2}$$

In [ ]:
m = ((y - c)*x).sum()/(x**2).sum()
print m

We can have a look at how good our fit is by computing the prediction across the input space. First create a vector of 'test points',


In [ ]:
x_test = np.linspace(1890, 2020, 130)[:, None]

Now use this vector to compute some test predictions,


In [ ]:
f_test = m*x_test + c

Now plot those test predictions with a blue line on the same plot as the data,


In [ ]:
plt.plot(x_test, f_test, 'b-')
plt.plot(x, y, 'rx')

The fit isn't very good, we need to iterate between these parameter updates in a loop to improve the fit, we have to do this several times,


In [ ]:
for i in np.arange(10):
    m = ((y - c)*x).sum()/(x*x).sum()
    c = (y-m*x).sum()/y.shape[0]
print(m)
print(c)

And let's try plotting the result again


In [ ]:
f_test = m*x_test + c
plt.plot(x_test, f_test, 'b-')
plt.plot(x, y, 'rx')

Clearly we need more iterations than 10! Let's try add more iterations above to try and get closer to the solution.

Direct Solution with Linear Algebra

Hopefully, you are now persuaded of the merits of solving the entire system, simultaneously, using linear algebra. To do that, we need to make a design matrix of the data, which includes the $x_0=1$ column, to represent the bias, remember (from the lecture notes) that we are now moving to a system where our prediction is given by an inner product:

$$f(\mathbf{x}_i) = \mathbf{x}_i^\top\mathbf{w}$$

where each vector $\mathbf{x}_i$ is given by appending a 1 onto the original vector

$$\mathbf{x}_i = \begin{bmatrix} 1 \\\ x_i \end{bmatrix}$$

We can do this for the entire data set to form a design matrix $\mathbf{X}$,

$$\mathbf{X} = \begin{bmatrix} \mathbf{x}_1^\top \\\ \mathbf{x}_2^\top \\\ \vdots \\\ \mathbf{x}_N^\top \end{bmatrix} = \begin{bmatrix} 1 & x_1 \\\ 1 & x_2 \\\ \vdots & \vdots \\\ 1 & x_N \end{bmatrix},$$

which in numpy is done with the following commands:


In [ ]:
X = np.hstack((np.ones_like(x), x))
print(X)

From the multivariate regression solution we derived in this week's lecture, the maximum likelihood solution for $\mathbf{w}^*$ is given by

$$\mathbf{w}^* = \left[\mathbf{X}^\top \mathbf{X}\right]^{-1} \mathbf{X}^\top \mathbf{y}$$

First let's persuade ourselves of a few things. We suggested in the lecture that

$$\sum_{i=1}^N \mathbf{x}_i\mathbf{x}_i^\top = \mathbf{X}^\top\mathbf{X}$$

We can show that this is, indeed the case for our data. First we need to know how to do matrix multiplication and transpose using numpy, this is done as


In [ ]:
np.dot(X.T, X)

Now we will compute the same thing with a for loop


In [ ]:
store = np.zeros((2, 2))
for i in range(X.shape[0]):
    store += np.outer(X[i, :], X[i, :])
print store

I hope that you agree that the first version is a little more compact.

Solving the System

The solution for $\mathbf{w}$ is given in terms of a matrix inverse, but numerically this isn't the best way to compute it. What we actually want python to do is to solve the system of linear equations given by

$$\mathbf{X}^\top\mathbf{X} \mathbf{w} = \mathbf{X}^\top\mathbf{y}$$

for $\mathbf{w}$. This can be done in numpy using the command


In [ ]:
np.linalg.solve?

so we can obtain the solution using


In [ ]:
w = np.linalg.solve(np.dot(X.T, X), np.dot(X.T, y))
print w

Allowing us to plot the fit as follows


In [ ]:
m = w[1]; c=w[0]
f_test = m*x_test + c
print(m)
print(c)
plt.plot(x_test, f_test, 'b-')
plt.plot(x, y, 'rx')

Quadratic Fit

Now we will fit a quadratic model using basis functions. Given everything we've learnt above, this is now quite easy to do. Firstly, we need to create a design matrix that contains the quadratic basis,

$$\mathbf{\Phi} = \left[ \mathbf{1} \quad \mathbf{x} \quad \mathbf{x}^2\right]$$

where this notation means that each column of $\mathbf{\Phi}$ is derived from the entire set of input years.


In [ ]:
Phi = np.hstack([np.ones(x.shape), x, x**2])

Now we can solve this system for $\mathbf{w}$ just as we did for the linear case, so we have,


In [ ]:
w = np.linalg.solve(np.dot(Phi.T, Phi), np.dot(Phi.T, y))
print(w)

We can plot the solution in two different ways, either we take


In [ ]:
f_test = w[2]*x_test**2 + w[1]*x_test + w[0]
plt.plot(x_test, f_test, 'b-')
plt.plot(x, y, 'rx')

Or we can do the matrix form of this equation which first involves creating a design matrix for the test points,


In [ ]:
Phi_test = np.hstack((np.ones_like(x_test), x_test, x_test**2))

and then computing the value of the function using a matrix multiply


In [ ]:
f_test = np.dot(Phi_test,w)
plt.plot(x_test, f_test, 'b-')
plt.plot(x, y, 'rx')

Note the values of the coefficient $w_2$ in particular. It is relatively small, because it is multiplying a large number (square of 2000 is 4 million). This need to use small coefficients becomes worse as we increase the order of the fit. As an exercise for this week, try fitting higher order polynomials to the data. See what happens as you increase the polynomial order.