Homework #2: Linear Regression

In this homework you will learn the concepts of linear regression by implementing it.

Implement the body of each function and test whether you have done right for each of them or not by running the tests. Each function has a test code block just below its definition.

  • Remember: m = number of data items (size of the data set) and n = number of features

Good luck!


In [ ]:
# import what we need

import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

Hypothesis Function

For a single data item $ x_{1 \times n} $, the hypothesis is the linear product of $ w_{n \times 1} $ and $ x_{1 \times n} $ in addition to bias $ b_{1 \times 1} $. The result is a number:

$$ h_{w,b}(x) = w^Tx + b $$

For a data set $ X_{m \times n} $ which contains multiple data items stacking vertically, the result is a vector $ h_{m \times 1} $ which contains predections for all data items:

$$ h_{w,b}(X) = Xw + b $$

In [ ]:
def h(X, w, b):
    """
    The hypothesis function
    
    X: the data set matrix (m x n)
    w: the weights vector (n x 1)
    b: the bias (1 x 1)
    
    Return: a vector stacking predictions for all data items (m x 1)
    """
    
    # YOUR CODE GOES HERE (~ 1 line)

In [ ]:
# test the hypothesis

X, y, w, b = (np.array([[1, 2, 3], [4, 5, 6]]), np.array([[2], [3]]), np.array([[3], [4], [5]]), 5)

hyp = h(X, w, b)
true = np.array([[31], [67]])

assert hyp.shape == (X.shape[0], 1), \
       'The result should be in shape ({}, 1). Currently is {}'.format(X.shape[0], hyp.shape)

if np.allclose(hyp, true):
    print('Hypothesis ok.')
else:
    print('Hypothesis does not work properly.')

Cost Function

Cost function for a linear regression model is Mean Squared Error over data set:

$$ \begin{equation} \begin{split} J_{w,b}(X) &= \frac{1}{2m}\sum_{i=1}^m (h_{w,b}(x^{(i)}) - y^{(i)})^2 \\ &= \frac{1}{2m}\sum_{i=1}^m (\hat{y}^{(i)} - y^{(i)})^2 \end{split} \end{equation} $$

The goal of linear regression is to minimize this cost.


In [ ]:
def cost(y_true, y_pred):
    """
    Mean squared error cost function
    
    y_true: the vector of true labels of data items (m x 1)
    y_pred: the vector of predictions of data items (m x 1)
    
    Return: a single number representing the cost
    """
    
    # YOUR CODE GOES HERE (~ 2 lines)

In [ ]:
# test cost function

from sklearn.metrics import mean_squared_error

X, y, w, b = (np.array([[1, 2, 3], [4, 5, 6]]), np.array([[2], [3]]), np.array([[3], [4], [5]]), 5)

mse = mean_squared_error(y, h(X, w, b))
cst = cost(y, h(X, w, b))

if np.isclose(mse / 2, cst):
    print('Cost function ok.')
else:
    print('Cost function does not work properly.')
    print('Should\'ve returned:', mse / 2)
    print('Returned:', cst)

Gradient Descent

Gradient descent algorithm tries to find the minimum of a function by starting somewhere on the function and taking small steps through the gradient of the function.

In linear regression, the function we are trying to minimize is the cost function $ J_{w,b}(X) $. The derivations are:

$$ \begin{equation} \begin{split} \frac{\partial J_{w,b}(X)}{\partial w_j} &= \frac{\partial}{\partial w_j} \frac{1}{2m}\sum_{i=1}^m (h_{w,b}(x^{(i)}) - y^{(i)})^2 \\ &= \frac{1}{2m}\sum_{i=1}^m 2 (h_{w,b}(x^{(i)}) - y^{(i)}) \frac{\partial}{\partial w_j} h_{w,b}(x^{(i)}) \\ &= \frac{1}{m}\sum_{i=1}^m (h_{w,b}(x^{(i)}) - y^{(i)}) x_{j}^{(i)} \\ &= \frac{1}{m}\sum_{i=1}^m (\hat{y}^{(i)} - y^{(i)}) x_{j}^{(i)} \end{split} \end{equation} $$$$ \begin{equation} \begin{split} \frac{\partial J_{w,b}(X)}{\partial b} &= \frac{1}{m}\sum_{i=1}^m (h_{w,b}(x^{(i)}) - y^{(i)}) \frac{\partial}{\partial b} h_{w,b}(x^{(i)}) \\ &= \frac{1}{m}\sum_{i=1}^m (h_{w,b}(x^{(i)}) - y^{(i)}) \\ &= \frac{1}{m}\sum_{i=1}^m (\hat{y}^{(i)} - y^{(i)}) \end{split} \end{equation} $$
  • Actually these two derivations are the same except that in the second one, $ x_{0}^{(i)} = 1 $.

In [ ]:
def gradient(X, y_true, y_pred):
    """
    The gradient of cost function
    
    X: the data set matrix (m x n)
    y_true: the vector of true labels of data items (m x 1)
    y_pred: the vector of predictions of data items (m x 1)
    
    Return: vector dJ/dw (n x 1) and number dJ/db (1 x 1)
    """
    
    # YOUR CODE GOES HERE (~ 4 lines)

In [ ]:
X, y, w, b = (np.array([[1, 2, 3], [4, 5, 6]]), np.array([[2], [3]]), np.array([[3], [4], [5]]), 5)

true = (np.array([[142.5], [189], [235.5]]), 46.5)
res = gradient(X, y, h(X, w, b))

if np.allclose(res[0], true[0]) and np.isclose(res[1], true[1]):
    print('Gradient function ok.')
else:
    print('Gradient function is not working properly.')
    print('should output:', true)
    print('Outputted:', res)

In [ ]:
def update_parameters(X, y_true, y_pred, w, b, alpha):
    """
    This function updates parameters w and b according to their derivations.
    It should compute the cost function derivations with respect to w and b first,
        then take a step for each parameters in w and b.
    
    X: the data set matrix (m x n)
    y_true: the vector of true labels of data items (m x 1)
    y_pred: the vector of predictions of data items (m x 1)
    w: the weights vector (n x 1)
    b: the bias (1 x 1)
    alpha: the learning rate
    
    Returns: the updated parameters w and b
    """
    
    # YOUR CODE GOES HERE (~ 4 lines)

In [ ]:
# test update_parameters function

X, y, w, b = (np.array([[1, 2, 3], [4, 5, 6]]), np.array([[2], [3]]), np.array([[3], [4], [5]]), 5)

res = update_parameters(X, y, h(X, w, b), w, b, 0.01)
true = (np.array([[1.575], [2.11], [2.645]]), 4.535)

if np.allclose(res[0], true[0]) and np.isclose(res[1], true[1]):
    print('Update parameters function ok.')
else:
    print('Update parameters function is not working properly.')
    print('should output:', true)
    print('Outputted:', res)

In [ ]:
def gradient_descent(X, y, alpha, n_iterations):
    """
    The gradient descent algorithm:
        1. initialize parameters w and b with zeros (not randoms)
        for i in n_iterations:
            2. compute the hypothesis h(X, w, b)
            3. update the parameters using function update_parameters(X, y_true, y_pred, w, b, alpha)
            4. compute the cost and see the cost is decreasing in each step (optional)
            
    X: the data set matrix (m x n)
    y: the vector of true labels of data items (m x 1)
    alpha: the learning rate
    n_iterations: number of steps gradient descent should take to converge
    
    Returns: the best parameters w and b gradient descent found at last
    """
    
    # YOUR CODE GOES HERE (~ 7 lines)

In [ ]:
# test gradient_descent function

true = (np.array([[0.11532006], [0.19906458], [0.2828091]]), 0.083744520421231317)
res = gradient_descent(X, y, 0.01, 20)

if np.allclose(res[0], true[0]) and np.isclose(res[1], true[1]):
    print('Gradient descent function ok.')
else:
    print('Gradient descent function is not working properly.')
    print('should output:', true)
    print('Outputted:', res)

Test on Real Data


In [ ]:
# load the data

from sklearn.datasets import load_diabetes
from sklearn.preprocessing import scale
X, y = load_diabetes(return_X_y=True)
X = X[:, 3].reshape(X.shape[0], 1)
X = scale(X)
y = y.reshape((y.shape[0], 1))

In [ ]:
# train a linear regression model from sklearn

from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X, y)

In [ ]:
# train our linear regression model

w, b = gradient_descent(X, y, 0.1, 25)

In [ ]:
# plot the result

plt.scatter(X, y, s=10, color='gray')
plt.plot(X, h(X, w, b), color='red', label='Ours');
plt.plot(X, model.predict(X), color='blue', label='Best');
plt.legend();

Our hypothesis should be close to the best hypothesis. If they are not the same, raise n_iterations argument for our gradient_descent function and be careful of alpha the learning rate.

How well your linear regression model is? How much happy are you!? :D