In [1]:

    
%matplotlib inline



In [2]:

    
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize

Preparation (Based on Linear Regression)

Prepare train and test data.



In [3]:

    
data_original = np.loadtxt('stanford_dl_ex/ex1/housing.data')
data = np.insert(data_original, 0, 1, axis=1)
np.random.shuffle(data)
train_X = data[:400, :-1]
train_y = data[:400, -1]

m, n = train_X.shape
theta = np.random.rand(n)

Define some necessary functions.



In [4]:

    
def cost_function(theta, X, y): 
    squared_errors = (X.dot(theta) - y) ** 2
    J = 0.5 * squared_errors.sum()
    return J

def gradient(theta, X, y):
    errors = X.dot(theta) - y
    return errors.dot(X)

Gradient Checking

Define "step size" (don't set it too low to avoid numerical precision issues).



In [5]:

    
epsilon = 1e-4

Prepare theta step values (making use of numpy broadcasting).



In [6]:

    
mask = np.identity(theta.size)
theta_plus = theta + epsilon * mask
theta_minus = theta - epsilon * mask

Compute diffs between theta's gradient as mathematically defined and the gradient as defined by our function above.



In [8]:

    
diffs = np.empty_like(theta)
for i in range(theta_plus.shape[0]):
    gradient_def = (
        (cost_function(theta_plus[i], train_X, train_y) - cost_function(theta_minus[i], train_X, train_y)) /
        (2 * epsilon)
    )
    gradient_lin_reg = gradient(theta, train_X, train_y)[i]
    diffs[i] = np.absolute(gradient_def - gradient_lin_reg)



In [9]:

    
diffs









    Out[9]:





array([  3.31414049e-06,   1.44055812e-05,   9.00169834e-06,
         5.53415157e-06,   6.84440965e-06,   1.88233535e-05,
         1.28877582e-05,   1.83098018e-06,   4.86033969e-06,
         2.71014869e-06,   3.72529030e-07,   1.79391354e-05,
         1.09821558e-05,   1.00492034e-05])

Lookin' good! The smaller the values, the better.
(Any value significantly larger than 1e-4 indicates a problem.)



In [10]:

    
assert all(np.less(diffs, 1e-4))

Quality check: passed with distinction.