x is called input variables or input features.
y is called output or target variable. Also sometimes known as label.
h is called hypothesis or model.
pair (x(i),y(i)) is called a sample or training example
dataset of all training examples is called training set.
m is the number of samples in a dataset.
n is the number of features in a dataset excluding label.
<img style="float: left;" src="images/02_02.png", width=400>
Dimensions of $X$ is $m\times n$
$X = \left[ \begin{array}{cc} x_1^1 & x_1^2 & .. & x_1^{n} \\ x_2^1 & x_2^2 & .. & x_2^{n} \\ x_3^1 & x_3^2 & .. & x_3^{n} \\ ... \\ x_{m}^1 & x_{m}^2 & .. & x_{m}^{n} \end{array} \right]$
$\theta$ has dimension $(n+1)\times 1$
$\theta = \left[ \begin{array}{cc} \theta_0 \\ \theta_1 \\ \theta_2 \\ ... \\ \theta_{n} \\ \end{array} \right]$
Now X has dimensions $m\times (n+1)$
$X = \left[ \begin{array}{cc} 1 & x_1^1 & x_1^2 & .. & x_1^{n} \\ 1 & x_2^1 & x_2^2 & .. & x_2^{n} \\ 1 & x_3^1 & x_3^2 & .. & x_3^{n} \\ ... \\ 1 & x_{m}^1 & x_{m}^2 & .. & x_{m}^{n} \end{array} \right]$
where $x_i$ is $i^{th}$ sample, e.g. $x_2 = [ \begin{array}{cc} 4.9 & 3.0 & 1.4 & 0.2 \end{array}]$
and $x_i^{j}$ is value of feature $j$ in the $i^{th}$ training example e.g. $x_2^3=1.4$
Cost function = J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$
where $h(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 .... + \theta_n x_n$
<img style="float: center;" src="images/03_02.png", width=300>
Cost function:
J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$
Gradient descent equation:
$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1)$
Replacing J($\theta$) for each j
$\begin{align*} & \text{repeat until convergence:} \; \lbrace \newline \; & \theta_0 := \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x_{i}) - y_{i}) \cdot x^0_{i}\newline \; & \theta_1 := \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x_{i}) - y_{i}) \cdot x^1_{i} \newline \; & \theta_2 := \theta_2 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x_{i}) - y_{i}) \cdot x^2_{i} \newline & \cdots \newline \rbrace \end{align*}$
or more generally
$\begin{align*}& \text{repeat until convergence:} \; \lbrace \newline \; & \theta_j := \theta_j - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x_{i}) - y_{i}) \cdot x^j_{i} \; & \text{for j := 0...n}\newline \rbrace\end{align*}$
Aim is to have:
$-1 \le x^i \le 1$
or
$-0.5 \le x^i \le 0.5$
In [141]:
%matplotlib inline
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
import matplotlib as mpl
# read data in pandas frame
dataframe = pd.read_csv('datasets/house_dataset2.csv', encoding='utf-8')
In [142]:
# check data by printing first few rows
dataframe.head()
Out[142]:
In [143]:
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
fig.set_size_inches(12.5, 7.5)
ax = fig.add_subplot(111, projection='3d')
ax.scatter(xs=dataframe['size'], ys=dataframe['bedrooms'], zs=dataframe['price'])
ax.set_ylabel('bedrooms'); ax.set_xlabel('size'); ax.set_zlabel('price')
# ax.view_init(10, -45)
plt.show()
In [144]:
dataframe.describe()
Out[144]:
In [145]:
#Quick visualize data
plt.grid(True)
plt.xlim([-1,5000])
dummy = plt.hist(dataframe["size"],label = 'Size')
dummy = plt.hist(dataframe["bedrooms"],label = 'Bedrooms')
plt.title('Clearly we need feature normalization.')
plt.xlabel('Column Value')
plt.ylabel('Counts')
dummy = plt.legend()
In [146]:
mean_size = dataframe["size"].mean()
std_size = dataframe["size"].std()
mean_bed = dataframe["bedrooms"].mean()
std_bed = dataframe["bedrooms"].std()
In [147]:
dataframe["size"] = (dataframe["size"] - mean_size)/std_size
In [148]:
dataframe["bedrooms"] = (dataframe["bedrooms"] - mean_bed)/std_bed
In [149]:
dataframe.describe()
Out[149]:
In [150]:
# reassign X
# assign X
X = np.array(dataframe[['size','bedrooms']])
X = np.insert(X,0,1,axis=1)
#Quick visualize data
plt.grid(True)
plt.xlim([-5,5])
dummy = plt.hist(dataframe["size"],label = 'Size')
dummy = plt.hist(dataframe["bedrooms"],label = 'Bedrooms')
plt.title('Feature scaled and normalization.')
plt.xlabel('Column Value')
plt.ylabel('Counts')
dummy = plt.legend()
In [151]:
# assign X and y
X = np.array(dataframe[['size','bedrooms']])
y = np.array(dataframe[['price']])
m = y.size # number of training examples
# insert all 1's column for theta_0
X = np.insert(X,0,1,axis=1)
# initialize theta
# initial_theta = np.zeros((X.shape[1],1))
initial_theta = np.random.rand(X.shape[1],1)
In [152]:
initial_theta
Out[152]:
In [153]:
X.shape
Out[153]:
In [154]:
initial_theta.shape
Out[154]:
In [155]:
iterations = 1500
alpha = 0.1
In [156]:
def h(X, theta): #Linear hypothesis function
hx = np.dot(X,theta)
return hx
In [157]:
def computeCost(theta,X,y): #Cost function
"""
theta_start is an n- dimensional vector of initial theta guess
X is matrix with n- columns and m- rows
y is a matrix with m- rows and 1 column
"""
#note to self: *.shape is (rows, columns)
return float((1./(2*m)) * np.dot((h(X,theta)-y).T,(h(X,theta)-y)))
#Test that running computeCost with 0's as theta returns 65591548106.45744:
initial_theta = np.zeros((X.shape[1],1)) #(theta is a vector with n rows and 1 columns (if X has n features) )
print (computeCost(initial_theta,X,y))
In [158]:
#Actual gradient descent minimizing routine
def gradientDescent(X, theta_start = np.zeros(2)):
"""
theta_start is an n- dimensional vector of initial theta guess
X is matrix with n- columns and m- rows
"""
theta = theta_start
j_history = [] #Used to plot cost as function of iteration
theta_history = [] #Used to visualize the minimization path later on
for meaninglessvariable in range(iterations):
tmptheta = theta
# append for plotting
j_history.append(computeCost(theta,X,y))
theta_history.append(list(theta[:,0]))
#Simultaneously updating theta values
for j in range(len(tmptheta)):
tmptheta[j] = theta[j] - (alpha/m)*np.sum((h(X,theta) - y)*np.array(X[:,j]).reshape(m,1))
theta = tmptheta
return theta, theta_history, j_history
In [159]:
#Actually run gradient descent to get the best-fit theta values
theta, thetahistory, j_history = gradientDescent(X,initial_theta)
In [160]:
theta
Out[160]:
In [161]:
plt.plot(j_history)
plt.title("Convergence of Cost Function")
plt.xlabel("Iteration number")
plt.ylabel("Cost function")
plt.show()
In [162]:
dataframe.head()
Out[162]:
In [166]:
x_test = np.array([1,0.130010,-0.22367])
print("$%0.2f" % float(h(x_test,theta)))
In [168]:
hx = h(X, theta)
In [169]:
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
fig.set_size_inches(12.5, 7.5)
ax = fig.add_subplot(111, projection='3d')
ax.scatter(xs=dataframe['size'], ys=dataframe['bedrooms'], zs=dataframe['price'])
ax.set_ylabel('bedrooms'); ax.set_xlabel('size'); ax.set_zlabel('price')
# ax.plot(xs=np.array(X[:,0],dtype=object).reshape(-1,1), ys=np.array(X[:,1],dtype=object).reshape(-1,1), zs=hx, color='green')
ax.plot(X[:,0], X[:,1], np.array(hx[:,0]), label='fitted line', color='green')
# ax.view_init(20, -165)
plt.show()
Raschka, Sebastian. Python machine learning. Birmingham, UK: Packt Publishing, 2015. Print.