Exercise 5 | Regularized Linear Regression and Bias-Variance


In [ ]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
import scipy.io

In [ ]:
%matplotlib inline

Part 1: Loading and Visualizing Data

We start the exercise by first loading and visualizing the dataset. The following code will load the dataset into your environment and plot the data.


In [ ]:
ex5data1 = scipy.io.loadmat('ex5data1.mat')
X = ex5data1['X']
y = ex5data1['y'][:,0]
Xtest = ex5data1['Xtest']
ytest = ex5data1['ytest'][:,0]
Xval = ex5data1['Xval']
yval = ex5data1['yval'][:,0]
print(X.shape, Xtest.shape, Xval.shape)

In [ ]:
m = X.shape[0]

Plot training data:


In [ ]:
fig, ax = plt.subplots(figsize=(5,5))
ax.scatter(X, y, marker='x', c='r')
ax.set_xlabel('Change in water level')
ax.set_ylabel('Water flowing out of the dam')

Part 2: Regularized Linear Regression Cost

You should now implement the cost function for regularized linear regression.


In [ ]:
def linearRegCostFunction(X, y, theta, lambda_):
    #LINEARREGCOSTFUNCTION Compute cost and gradient for regularized linear 
    #regression with multiple variables
    #   [J, grad] = LINEARREGCOSTFUNCTION(X, y, theta, lambda) computes the 
    #   cost of using theta as the parameter for linear regression to fit the 
    #   data points in X and y. Returns the cost in J and the gradient in grad
    
    m = len(y) # number of training examples
    
    # You need to return the following variables correctly 
    J = 0
    grad = np.zeros(theta.shape)
    
    
    # ====================== YOUR CODE HERE ======================
    # Instructions: Compute the cost and gradient of regularized linear 
    #               regression for a particular choice of theta.
    #
    #               You should set J to the cost and grad to the gradient.
    #
    
    
    
    # ============================================================
    
    return J, grad

In [ ]:
theta = np.array([1, 1])
J, grad = linearRegCostFunction(np.hstack([np.ones((m, 1)), X]), y, theta, 1)

The cost at theta = [1, 1] should be about 303.993192.

The gradient at theta = [1, 1] should be about [-15.303016; 598.250744].

Part 3: Regularized Linear Regression Gradient

You should now implement the gradient for regularized linear regression.


In [ ]:
J

In [ ]:
grad

Part 4: Train Linear Regression

Once you have implemented the cost and gradient correctly, the trainLinearRegression function will use your cost function to train regularized linear regression.

Write Up Note: The data is non-linear, so this will not give a great fit.


In [ ]:
def trainLinearRegression(X, y, lambda_):
    initial_theta = np.zeros(X.shape[1])
    
    def costFunction(t):
        return linearRegCostFunction(X, y, t, lambda_)
    
    res = scipy.optimize.minimize(costFunction, initial_theta, jac=True, method='L-BFGS-B',
                                  options=dict(maxiter=200))
    return res

Train linear regression:


In [ ]:
res = trainLinearRegression(np.hstack([np.ones((m, 1)), X]), y, 0)
print(res)
optimal_theta = res.x

Plot fit over the data:


In [ ]:
fig, ax = plt.subplots(figsize=(5,5))
ax.scatter(X, y, marker='x', c='r')
ax.set_xlabel('Change in water level')
ax.set_ylabel('Water flowing out of the dam')
ax.plot(X, np.hstack([np.ones((m, 1)), X]).dot(optimal_theta))

=========== Part 5: Learning Curve for Linear Regression =============

Next, you should implement the learning_curve function.

Write Up Note: Since the model is underfitting the data, we expect to see a graph with "high bias" -- slide 8 in ML-advice.pdf


In [ ]:
def learning_curve(X, y, Xval, yval, lambda_):
    #LEARNINGCURVE Generates the train and cross validation set errors needed 
    #to plot a learning curve
    #   [error_train, error_val] = ...
    #       LEARNINGCURVE(X, y, Xval, yval, lambda) returns the train and
    #       cross validation set errors for a learning curve. In particular, 
    #       it returns two vectors of the same length - error_train and 
    #       error_val. Then, error_train(i) contains the training error for
    #       i examples (and similarly for error_val(i)).
    #
    #   In this function, you will compute the train and test errors for
    #   dataset sizes from 1 up to m. In practice, when working with larger
    #   datasets, you might want to do this in larger intervals.
    #
    
    # Number of training examples
    m = X.shape[0]
    
    # You need to return these values correctly
    error_train = np.zeros(m-1)
    error_val   = np.zeros(m-1)
    
    
    # ====================== YOUR CODE HERE ======================
    # Instructions: Fill in this function to return training errors in 
    #               error_train and the cross validation errors in error_val. 
    #               i.e., error_train[i] and 
    #               error_val[i] should give you the errors
    #               obtained after training on i+1 examples.
    #
    # Note: You should evaluate the training error on the first i training
    #       examples (i.e., X[:i, :] and y[:i]).
    #
    #       For the cross-validation error, you should instead evaluate on
    #       the _entire_ cross validation set (Xval and yval).
    #
    # Note: If you are using your cost function (linearRegCostFunction)
    #       to compute the training and cross validation error, you should 
    #       call the function with the lambda argument set to 0. 
    #       Do note that you will still need to use lambda when running
    #       the training to obtain the theta parameters.
    #
    # Hint: You can loop over the examples with the following:
    #
    #       for i = range(1,m):
    #           # Compute train/cross validation errors using training examples 
    #           # X[:i, :] and y[:i], storing the result in 
    #           # error_train[i-1] and error_val[i-1]
    #           ....
    #           
    #       end
    #
    
    
        
        
    # =========================================================================
    
    return error_train, error_val

In [ ]:
error_train, error_val = learning_curve(np.hstack([np.ones((m, 1)), X]), y, 
                                        np.hstack([np.ones((Xval.shape[0], 1)), Xval]), yval, 
                                        0)

In [ ]:
print('\n'.join(str(x) for x in zip(error_train, error_val)))

In [ ]:
plt.plot(range(m-1), error_train, range(m-1), error_val)

Part 6: Feature Mapping for Polynomial Regression

One solution to this is to use polynomial regression. You should now complete poly_features to map each example into its powers.


In [ ]:
def poly_features(X_orig, p):
    #POLYFEATURES Maps X (1D vector) into the p-th power
    #   [X_poly] = POLYFEATURES(X, p) takes a vector X (size m) and
    #   maps each example into its polynomial features where
    #   X_poly[i, :] = [X[i] X[i]**2 X[i]**3 ...  X[i]*p]
    #
    
    # You need to return the following variables correctly.
    X_poly = np.zeros((len(X_orig), p))
    
    
    # ====================== YOUR CODE HERE ======================
    # Instructions: Given a vector X, return a matrix X_poly where the p-th 
    #               column of X contains the values of X to the p-th power.
    #
    # 
    
    
    
    
    
    # =============================================================
    
    
    return X_poly

Map X onto Polynomial Features:


In [ ]:
p = 8
X_poly = poly_features(X, p)
X_poly.shape

In [ ]:
def feature_normalize(X):
    mu = np.mean(X, axis=0)
    X_norm = X - mu
    sigma = np.std(X_norm, axis=0)
    sigma[sigma == 0] = 1
    X_norm = X_norm / sigma
    
    return X_norm, mu, sigma

Normalize X:


In [ ]:
X_poly, mu, sigma = feature_normalize(X_poly)
X_poly.shape

Add ones column


In [ ]:
X_poly = np.hstack([np.ones((m, 1)), X_poly])

map and normalize X_test and X_val


In [ ]:
X_poly_test = (poly_features(Xtest, p) - mu) / sigma
X_poly_test = np.hstack([np.ones((X_poly_test.shape[0], 1)), X_poly_test])
X_poly_test.shape

In [ ]:
X_poly_val = (poly_features(Xval, p) - mu) / sigma
X_poly_val = np.hstack([np.ones((X_poly_val.shape[0], 1)), X_poly_val])
X_poly_val.shape

Part 7: Learning Curve for Polynomial Regression

Now, you will get to experiment with polynomial regression with multiple values of lambda. The code below runs polynomial regression with lambda = 0. You should try running the code with different values of lambda to see how the fit and learning curve change.


In [ ]:
def plot_fit(ax, min_x, max_x, mu, sigma, theta, p):
    x = np.linspace(min_x - 15, max_x + 15)
    X_poly = (poly_features(x, p) - mu) / sigma
    X_poly = np.c_[np.ones(len(x)), X_poly]
    ax.plot(x, X_poly.dot(theta))

Plot training data and fit:


In [ ]:
lambda_ = 0
res = trainLinearRegression(X_poly, y, lambda_)
print(res)
theta = res.x

fig, ax = plt.subplots(figsize=(5,5))
ax.scatter(X, y, marker='x', c='r')
ax.set_xlabel('Change in water level')
ax.set_ylabel('Water flowing out of the dam')
plot_fit(ax, np.min(X), np.max(X), mu, sigma, theta, p)

Polynomial Regression Learning Curve:


In [ ]:
error_train, error_val = learning_curve(X_poly, y, X_poly_val, yval, lambda_)
plt.plot(range(m-1), error_train)
plt.plot(range(m-1), error_val)
plt.legend(['Train', 'Cross Validation'])

Part 8: Validation for Selecting Lambda

You will now implement validation_curve to test various values of lambda on a validation set. You will then use this to select the "best" lambda value.


In [ ]:
def validation_curve(X, y, Xval, yval):
    #VALIDATIONCURVE Generate the train and validation errors needed to
    #plot a validation curve that we can use to select lambda
    #   [lambda_vec, error_train, error_val] = ...
    #       VALIDATIONCURVE(X, y, Xval, yval) returns the train
    #       and validation errors (in error_train, error_val)
    #       for different values of lambda. You are given the training set (X,
    #       y) and validation set (Xval, yval).
    #

    # Selected values of lambda (you should not change this)
    lambda_vec = [0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10]
    
    # You need to return these variables correctly.
    error_train = np.zeros(len(lambda_vec))
    error_val = np.zeros(len(lambda_vec))
    
    
    # ====================== YOUR CODE HERE ======================
    # Instructions: Fill in this function to return training errors in 
    #               error_train and the validation errors in error_val. The 
    #               vector lambda_vec contains the different lambda parameters 
    #               to use for each calculation of the errors, i.e, 
    #               error_train(i), and error_val(i) should give 
    #               you the errors obtained after training with 
    #               lambda = lambda_vec(i)
    #
    # Note: You can loop over lambda_vec with the following:
    #
    #       for i in range(len(lambda_vec)):
    #           lambda = lambda_vec[i];
    #           # Compute train / val errors when training linear 
    #           # regression with regularization parameter lambda
    #           # You should store the result in error_train[i]
    #           # and error_val[i]
    #           ....
    #           
    #       end
    #
    #
    
    
    
    
    
    # =========================================================================
    
    return lambda_vec, error_train, error_val

In [ ]:
lambda_vec, error_train, error_val = validation_curve(X_poly, y, X_poly_val, yval)
plt.plot(lambda_vec, error_train, lambda_vec, error_val)
plt.ylim([0, 20])
plt.legend(['Training error', 'Cross validation error'])

In [ ]:
lambda_ = 3
res = trainLinearRegression(X_poly, y, lambda_)
theta = res.x

fig, ax = plt.subplots(figsize=(5,5))
ax.scatter(X, y, marker='x', c='r')
ax.set_xlabel('Change in water level')
ax.set_ylabel('Water flowing out of the dam')
plot_fit(ax, np.min(X), np.max(X), mu, sigma, theta, p)