Suppose you are selling your house, and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices.
The file ex1data2.txt contains a training set of housing prices in Portland, Oregon. The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house.
In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# import data
data = pd.read_csv('mlclass-ex1/ex1data2.txt', header=None)
Features: $x_{1}, x_{2}, x_{3}, ..., x_{n}$
Prediction: $y$
$m =$ number of training examples
$n =$ number of features
$x^{(i)} =$ input (features) of the $i$-th training example (the superscript is the index into the training set)
$x_{j}^{(i)} =$ value of feature $j$ in the $i$-th training example
For this data set:
$x_{1}$: size (feet$^{2}$)
$x_{2}$: # of bedrooms
$y$: price ($)
$x_{j}^{(i)} = \frac{x_{j}^{(i)} - \bar{x}_{j}}{s_{j}}$
$h_{\theta}(x) = \theta_{0} + \theta_{1}x_{1} + \theta_{2}x_{2} + ... + \theta_{n}x_{n}$
For convienience of notation, define $x_{0} = 1$, so that $x_{0}^{(i)} = 1$
Now, the feature vector can be zero-indexed:
$x^{(i)} = \begin{bmatrix} x_{0} \\ x_{1} \\ x_{2} \\ ... \\ x_{n} \end{bmatrix} \in \mathbb{R}^{n+1}$
And, since the parameter vector is zero-indexed:
$\theta = \begin{bmatrix} \theta_{0} \\ \theta_{1} \\ \theta_{2} \\ ... \\ \theta_{n} \end{bmatrix} \in \mathbb{R}^{n+1}$
The new hypothesis form is: $h_{\theta}(x) = \theta_{0}x_{0} + \theta_{1}x_{1} + \theta_{2}x_{2} + ... + \theta_{n}x_{n}$
Construct the following ($m\times[n+1]$) design matrix:
$X = \begin{bmatrix} x_{0}^{(0)} && x_{1}^{(0)} && ... && x_{n}^{(0)} \\ x_{0}^{(1)} && x_{1}^{(1)} && ... && x_{n}^{(1)} \\ x_{0}^{(2)} && x_{1}^{(2)} && ... && x_{n}^{(2)} \\ ... && ... && ... && ... \\ x_{0}^{(m-1)} && x_{1}^{(m-1)} && ... && x_{n}^{(m-1)} \end{bmatrix}$
To create a ($m\times1$) hypothesis matrix, perform matrix-vector multiplication:
$X \times \theta = \begin{bmatrix} x_{0}^{(0)} \times \theta_{0} + x_{1}^{(0)} \times \theta_{1} \\ x_{0}^{(1)} \times \theta_{0} + x_{1}^{(1)} \times \theta_{1} \\ x_{0}^{(2)} \times \theta_{0} + x_{1}^{(2)} \times \theta_{1} \\ ... \\ x_{0}^{(m-1)} \times \theta_{0} + x_{1}^{(m-1)} \times \theta_{1} \end{bmatrix}$
Alternatively, to compute each hypothesis return value per training sample:
Perform the inner/dot/scalar product of vectors of $\theta$-transpose and $X^{(i)}$:
$\theta^{T} = \begin{bmatrix}\theta_{0}&&\theta_{1}&&\theta_{2}&&...&&\theta_{n}\end{bmatrix}$
$h_{\theta}(x) = \theta^{T}X^{(i)} = \begin{bmatrix}\theta_{0}&&\theta_{1}&&\theta_{2}&&...&&\theta_{n}\end{bmatrix} \times \begin{bmatrix} x_{0}^{(i)} \\ x_{1}^{(i)} \\ x_{2}^{(i)} \\ ... \\ x_{n}^{(i)} \end{bmatrix}$
In [3]:
# initial data vectors
x_1 = data[0].values
x_2 = data[1].values
y = data[2].values
# training sample size
m = data[0].count()
# number of features
n = 2
# proper vector shapes
x_1.shape = (m,1)
x_2.shape = (m, 1)
y.shape = (m,1)
# Feature Scaling (unnecessary if using the normal equation method)
# option1: normalization
#x_1 = np.true_divide(x_1, np.amax(x_1))
#x_2 = np.true_divide(x_2, np.amax(x_2))
# option2: mean normalization
xbar_1 = np.mean(x_1)
xbar_2 = np.mean(x_2)
s_1 = np.std(x_1)
s_2 = np.std(x_2)
x_1 = np.divide((np.subtract(x_1,xbar_1)),s_1)
x_2 = np.divide((np.subtract(x_2,xbar_2)),s_2)
# design matrix
X = np.hstack((np.ones((m,1)),x_1,x_2))
# theta parameters
theta = np.zeros(((n+1),1))
#gradient descent, number of iterations
iterations = 50
# learning rates
alpha = [-0.3, -0.1, -0.03, -0.01, -0.003, -0.001, 0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3]
$J(\theta_{0},\theta_{1},...,\theta_{n}) = J(\theta) = \frac{1}{2m}\sum\limits_{i=0}^{m-1}(h_{\theta}(x^{(i)}) - y^{(i)})^2 = \frac{1}{2m}\sum\limits_{i=0}^{m-1}(\theta^{T}X^{(i)} - y^{(i)})^2 = \frac{1}{2m}(X\theta - y)^{T}(X\theta - y)$
$\delta = \begin{bmatrix} \delta_{0} \\ \delta_{1} \\ \delta_{2} \\ ... \\ \delta_{n} \end{bmatrix}$
$\delta_{j} = \frac{1}{m}\sum\limits_{i=0}^{m-1}(\theta^{T}X^{(i)} - y^{(i)}) \cdot X_{j}^{(i)}$
In [4]:
# cost function: iterative implementation
#def J():
# summation = 0
# for i in xrange(0,m):
# summation += ((theta.T.dot(X[i])) - y[i])**2
#
# return (1.0/(2*m)) * summation
#
# cost function: vectorized implementation
def J():
return (1.0/(2*m)) * (((X.dot(theta)) - y).T).dot(X.dot(theta) - y)
# delta-vector function for derivatives
def deltas():
delta = np.zeros(((n+1),1))
for j in xrange(0,(n+1)):
summation = 0
#for i in xrange(0,m):
#summation += ((theta.T.dot(X[i]) - y[i]) * X[i][j])
summation = np.sum((X.dot(theta) - y) * X[:,j])
delta[j] = (1.0/m) * summation
print np.sum(delta)
return delta
$\theta_{j} := \theta_{j} - \alpha \frac{\partial}{\partial\theta_{j}} J(\theta) = \theta_{j} - \alpha \frac{1}{m}\sum\limits_{i=0}^{m-1}(h_{\theta}(x^{(i)}) - y^{(i)}) \cdot x_{j}^{(i)} = \theta_{j} - \alpha \frac{1}{m}\sum\limits_{i=0}^{m-1}(\theta^{T}x^{(i)} - y^{(i)}) \cdot x_{j}^{(i)} = \theta_{j} - \alpha\delta_{j}$
repeat until convergence {
$\theta := \theta - \alpha \delta$
}
In [5]:
# gradient descent, test multiple alphas
for a in xrange(0,len(alpha)):
# reset theta
theta = np.zeros(((n+1),1))
# reset vector J_values, store cost function values for plotting
J_values = np.zeros((iterations,1))
for iteration in xrange(0,iterations):
theta = theta - (alpha[a] * deltas())
J_values[iteration] = J()
# visualize the cost function (2-D)
cost_x = np.arange(iterations)
cost_x.shape = (iterations,1)
plt.plot(cost_x,J_values)
plt.title("Learning Rate: " + str(alpha[a]))
plt.xlabel('iterations')
plt.ylabel(r"$J(\theta)$")
plt.show()
In [6]:
# predict price of house with 1650 feet^2 and 3 bedrooms
x = np.array([[1.0],
[1650],
[3]])
# mean normalize (unnecessary if using the normal equation method)
x[1] = np.true_divide((np.subtract(x[1],xbar_1)),s_1)
x[2] = np.true_divide((np.subtract(x[2],xbar_2)),s_2)
# compute and display prediction
y_hat = theta.T.dot(x)
print "\nPrediction for 1650 feet^2 and 3 bedrooms: " + str(y_hat) + "\n"
In [13]:
normal_eq_theta = (np.linalg.pinv(X.T.dot(X)).dot(X.T)).dot(y)
normal_eq_y_hat = normal_eq_theta.T.dot(x)
print "\n\nPrediction for 1650 feet^2 and 3 bedrooms: " + str(normal_eq_y_hat) + "\n"
last equality NOT FROM Ng (from Wikipedia)
$\theta = (X^{T}X)^{-1}X^{T}y = X^{+}y$
In [19]:
normal_eq_theta = (np.linalg.pinv(X).dot(y))
normal_eq_y_hat = normal_eq_theta.T.dot(x)
print "\n\nPrediction for 1650 feet^2 and 3 bedrooms: " + str(normal_eq_y_hat) + "\n"