x is called input variables or input features.
y is called output or target variable. Also sometimes known as label.
h is called hypothesis or model.
pair (x(i),y(i)) is called a sample or training example
dataset of all training examples is called training set.
m is the number of samples in a dataset.
n is the number of features in a dataset excluding label.
<img style="float: left;" src="images/02_02.png", width=400>
Let $\hat{y}$ = h(x) = $\theta$0 + $\theta$1x
Error in single sample (x,y) = $\hat{y}$ - y = h(x) - y
Cummulative error of all m samples = $\sum_{i=1}^{m} (h(x^i) - y^i)^2$
Finally mean squared error or cost function = J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$
<img style="float: left;" src="images/03_01.png", width=300> <img style="float: right;" src="images/03_02.png", width=300>
Gradient descent equation:
$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1)$
Linear regression Cost function:
J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$
Replacing J($\theta$) in gradient descent equation:
x | y |
---|---|
1 | 0.8 |
2 | 1.6 |
3 | 2.4 |
4 | 3.2 |
In [31]:
%matplotlib inline
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
# read data in pandas frame
dataframe = pd.read_csv('datasets/example1.csv', encoding='utf-8')
# assign x and y
X = np.array(dataframe[['x']])
y = np.array(dataframe[['y']])
m = y.size # number of training examples
In [32]:
# check data by printing first few rows
dataframe.head()
Out[32]:
In [33]:
#visualize results
plt.scatter(X, y)
plt.title("Dataset")
plt.xlabel("x")
plt.ylabel("y")
plt.show()
In [34]:
#best fit line
tmpx = np.array([0, 1, 2, 3, 4])
y1 = 0.2*tmpx
y2 = 0.7*tmpx
y3 = 1.5*tmpx
plt.scatter(X, y)
plt.plot(tmpx,y1)
plt.plot(tmpx,y2)
plt.plot(tmpx,y3)
plt.title("Best fit line")
plt.xlabel("x")
plt.ylabel("y")
plt.show()
In [35]:
theta0 = 0
theta1 = 0
cost = 0
for i in range(m):
hx = theta1*X[i,0] + theta0
cost += pow((hx - y[i,0]),2)
cost = cost/(2*m)
print (cost)
In [36]:
# predict using model
y_pred = theta1*X + theta0
# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for theta1 = 0")
plt.xlabel("x")
plt.ylabel("y")
plt.show()
In [37]:
# save theta1 and cost in a vector
cost_log = []
theta1_log = []
cost_log.append(cost)
theta1_log.append(theta1)
# plot
plt.scatter(theta1_log, cost_log)
plt.title("Theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()
In [38]:
theta0 = 0
theta1 = 1
cost = 0
for i in range(m):
hx = theta1*X[i,0] + theta0
cost += pow((hx - y[i,0]),2)
cost = cost/(2*m)
print (cost)
In [39]:
# predict using model
y_pred = theta1*X + theta0
# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for theta1 = 1")
plt.xlabel("x")
plt.ylabel("y")
plt.show()
In [40]:
# save theta1 and cost in a vector
cost_log.append(cost)
theta1_log.append(theta1)
# plot
plt.scatter(theta1_log, cost_log)
plt.title("Theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()
In [41]:
theta0 = 0
theta1 = 2
cost = 0
for i in range(m):
hx = theta1*X[i,0] + theta0
cost += pow((hx - y[i,0]),2)
cost = cost/(2*m)
print (cost)
# predict using model
y_pred = theta1*X + theta0
# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for theta1 = 2")
plt.xlabel("x")
plt.ylabel("y")
plt.show()
In [42]:
# save theta1 and cost in a vector
cost_log.append(cost)
theta1_log.append(theta1)
# plot
plt.scatter(theta1_log, cost_log)
plt.title("theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()
In [43]:
theta0 = 0
theta1 = -3.1
cost_log = []
theta1_log = [];
inc = 0.1
for j in range(61):
theta1 = theta1 + inc;
cost = 0
for i in range(m):
hx = theta1*X[i,0] + theta0
cost += pow((hx - y[i,0]),2)
cost = cost/(2*m)
cost_log.append(cost)
theta1_log.append(theta1)
In [44]:
plt.scatter(theta1_log, cost_log)
plt.title("theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()
In [45]:
theta0 = 0
theta1 = -3
alpha = 0.1
interations = 100
cost_log = []
iter_log = [];
inc = 0.1
for j in range(interations):
cost = 0
grad = 0
for i in range(m):
hx = theta1*X[i,0] + theta0
cost += pow((hx - y[i,0]),2)
grad += ((hx - y[i,0]))*X[i,0]
cost = cost/(2*m)
grad = grad/(2*m)
theta1 = theta1 - alpha*grad
cost_log.append(cost)
In [46]:
theta1
Out[46]:
In [47]:
plt.plot(cost_log)
plt.title("Convergence of Cost Function")
plt.xlabel("Iteration number")
plt.ylabel("Cost function")
plt.show()
In [48]:
# predict using model
y_pred = theta1*X + theta0
# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for Theta1 from Gradient Descent")
plt.xlabel("x")
plt.ylabel("y")
plt.show()
Raschka, Sebastian. Python machine learning. Birmingham, UK: Packt Publishing, 2015. Print.