CSAL4243: Introduction to Machine Learning

Muhammad Mudassir Khan (mudasssir.khan@ucp.edu.pk)

Lecture 4: Linear Regression and Gradient Descent Example

Overview



Machine Learning pipeline

  • x is called input variables or input features.

  • y is called output or target variable. Also sometimes known as label.

  • h is called hypothesis or model.

  • pair (x(i),y(i)) is called a sample or training example

  • dataset of all training examples is called training set.

  • m is the number of samples in a dataset.

  • n is the number of features in a dataset excluding label.

<img style="float: left;" src="images/02_02.png", width=400>



Linear Regression with one variable

Model Representation

  • Model is represented by h$\theta$(x) or simply h(x)

  • For Linear regression with one input variable h(x) = $\theta$0 + $\theta$1x

  • $\theta$0 and $\theta$1 are called weights or parameters.
  • Need to find $\theta$0 and $\theta$1 that maximizes the performance of model.




Cost Function

Let $\hat{y}$ = h(x) = $\theta$0 + $\theta$1x

Error in single sample (x,y) = $\hat{y}$ - y = h(x) - y

Cummulative error of all m samples = $\sum_{i=1}^{m} (h(x^i) - y^i)^2$

Finally mean squared error or cost function = J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$

<img style="float: left;" src="images/03_01.png", width=300> <img style="float: right;" src="images/03_02.png", width=300>



Gradient Descent

Gradient descent equation:

$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1)$

Linear regression Cost function:

J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$


Replacing J($\theta$) in gradient descent equation:

\begin{align*} \text{repeat until convergence: } \lbrace & \newline \theta_0 := & \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x_{i}) - y_{i}) \newline \theta_1 := & \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x_{i}) - y_{i}) x_{i}\right) \newline \rbrace& \end{align*}




Linear Regression Example

x y
1 0.8
2 1.6
3 2.4
4 3.2

Read data


In [49]:
%matplotlib inline
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt

# read data in pandas frame
dataframe = pd.read_csv('datasets/example1.csv', encoding='utf-8')

# assign x and y
X = np.array(dataframe[['x']])
y = np.array(dataframe[['y']])

m = y.size # number of training examples

In [32]:
# check data by printing first few rows
dataframe.head()


Out[32]:
x y
0 1 0.8
1 2 1.6
2 3 2.4
3 4 3.2

Plot data


In [33]:
#visualize results
plt.scatter(X, y)
plt.title("Dataset")
plt.xlabel("x")
plt.ylabel("y")
plt.show()


Find a line that best fit the data


In [34]:
#best fit line

tmpx = np.array([0, 1, 2, 3, 4])
y1 = 0.2*tmpx
y2 = 0.7*tmpx
y3 = 1.5*tmpx


plt.scatter(X, y)
plt.plot(tmpx,y1)
plt.plot(tmpx,y2)
plt.plot(tmpx,y3)
plt.title("Best fit line")
plt.xlabel("x")
plt.ylabel("y")
plt.show()


Lets assume $\theta_0 = 0$ and $\theta_1=0$

Model h(x) = $\theta_0$ + $\theta_1$x = 0

Cost function J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$ = $\frac{1}{2m}\sum_{i=1}^{m} (0 - y^i)^2$


In [35]:
theta0 = 0
theta1 = 0

cost = 0
for i in range(m):
        hx = theta1*X[i,0] + theta0
        cost += pow((hx - y[i,0]),2)

cost = cost/(2*m)             
print (cost)


2.4

plot it


In [36]:
# predict using model
y_pred = theta1*X + theta0

# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for theta1 = 0")
plt.xlabel("x")
plt.ylabel("y")
plt.show()


Plot $\theta1$ vs Cost


In [37]:
# save theta1 and cost in a vector
cost_log = []
theta1_log = []

cost_log.append(cost)
theta1_log.append(theta1)

# plot
plt.scatter(theta1_log, cost_log)
plt.title("Theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()


Lets assume $\theta_0 = 0$ and $\theta_1=1$

Model h(x) = $\theta_0$ + $\theta_1$x = x

Cost function J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$ = $\frac{1}{2m}\sum_{i=1}^{m} (x^i - y^i)^2$


In [38]:
theta0 = 0
theta1 = 1

cost = 0
for i in range(m):
        hx = theta1*X[i,0] + theta0
        cost += pow((hx - y[i,0]),2)

cost = cost/(2*m)             
print (cost)


0.15

plot it


In [39]:
# predict using model
y_pred = theta1*X + theta0

# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for theta1 = 1")
plt.xlabel("x")
plt.ylabel("y")
plt.show()


Plot $\theta1$ vs Cost again


In [40]:
# save theta1 and cost in a vector
cost_log.append(cost)
theta1_log.append(theta1)

# plot
plt.scatter(theta1_log, cost_log)
plt.title("Theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()


Lets assume $\theta_0 = 0$ and $\theta_1=2$

Model h(x) = $\theta_0$ + $\theta_1$x = 2x

Cost function J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$ = $\frac{1}{2m}\sum_{i=1}^{m} (2x^i - y^i)^2$


In [41]:
theta0 = 0
theta1 = 2

cost = 0
for i in range(m):
        hx = theta1*X[i,0] + theta0
        cost += pow((hx - y[i,0]),2)

cost = cost/(2*m)             
print (cost)


# predict using model
y_pred = theta1*X + theta0

# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for theta1 = 2")
plt.xlabel("x")
plt.ylabel("y")
plt.show()


5.4

In [42]:
# save theta1 and cost in a vector
cost_log.append(cost)
theta1_log.append(theta1)

# plot
plt.scatter(theta1_log, cost_log)
plt.title("theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()


Run it for a while


In [43]:
theta0 = 0
theta1 = -3.1

cost_log = []
theta1_log = [];

inc = 0.1
for j in range(61):
    theta1 = theta1 + inc;
    
    cost = 0
    for i in range(m):
        hx = theta1*X[i,0] + theta0
        cost += pow((hx - y[i,0]),2)

    cost = cost/(2*m)             

    cost_log.append(cost)
    theta1_log.append(theta1)

plot $\theta_1$ vs Cost


In [44]:
plt.scatter(theta1_log, cost_log)
plt.title("theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()




Lets do it with Gradient Descent now


In [60]:
theta0 = 0
theta1 = 2

alpha = 0.1
interations = 100

cost_log = []
theta_log = [];

inc = 0.1
for j in range(interations):
    
    cost = 0
    grad = 0
    for i in range(m):
        hx = theta1*X[i,0] + theta0      
        cost +=  pow((hx - y[i,0]),2)
        grad += ((hx - y[i,0]))*X[i,0]

    cost = cost/(2*m)
    grad = grad/(2*m)             
    theta1 = theta1 - alpha*grad
    
    
    cost_log.append(cost)
    theta_log.append(theta1)

In [61]:
theta_log


Out[61]:
[1.55,
 1.26875,
 1.09296875,
 0.98310546875000004,
 0.91444091796875004,
 0.87152557373046879,
 0.84470348358154301,
 0.8279396772384644,
 0.81746229827404027,
 0.81091393642127518,
 0.80682121026329701,
 0.80426325641456065,
 0.80266453525910042,
 0.80166533453693778,
 0.80104083408558613,
 0.80065052130349135,
 0.80040657581468211,
 0.80025410988417633,
 0.80015881867761018,
 0.80009926167350642,
 0.80006203854594149,
 0.80003877409121349,
 0.80002423380700849,
 0.80001514612938029,
 0.80000946633086267,
 0.8000059164567892,
 0.8000036977854933,
 0.80000231111593334,
 0.80000144444745835,
 0.80000090277966152,
 0.80000056423728849,
 0.80000035264830527,
 0.80000022040519081,
 0.8000001377532443,
 0.80000008609577766,
 0.80000005380986106,
 0.80000003363116323,
 0.80000002101947709,
 0.80000001313717317,
 0.80000000821073325,
 0.80000000513170832,
 0.80000000320731768,
 0.80000000200457355,
 0.80000000125285853,
 0.80000000078303657,
 0.8000000004893979,
 0.8000000003058737,
 0.80000000019117112,
 0.80000000011948202,
 0.80000000007467631,
 0.80000000004667271,
 0.80000000002917049,
 0.80000000001823157,
 0.80000000001139471,
 0.80000000000712168,
 0.80000000000445104,
 0.80000000000278193,
 0.80000000000173876,
 0.80000000000108673,
 0.80000000000067928,
 0.80000000000042459,
 0.80000000000026539,
 0.80000000000016591,
 0.80000000000010374,
 0.80000000000006488,
 0.80000000000004057,
 0.80000000000002536,
 0.80000000000001581,
 0.80000000000000993,
 0.80000000000000626,
 0.80000000000000393,
 0.80000000000000249,
 0.8000000000000016,
 0.80000000000000104,
 0.80000000000000071,
 0.80000000000000049,
 0.80000000000000027,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016]

Plot Convergence


In [59]:
plt.plot(cost_log)
plt.title("Convergence of Cost Function")
plt.xlabel("Iteration number")
plt.ylabel("Cost function")
plt.show()


Predict output using trained model


In [48]:
# predict using model
y_pred = theta1*X + theta0

# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for Theta1 from Gradient Descent")
plt.xlabel("x")
plt.ylabel("y")
plt.show()


Credits

Raschka, Sebastian. Python machine learning. Birmingham, UK: Packt Publishing, 2015. Print.

Andrew Ng, Machine Learning, Coursera

Lucas Shen

David Kaleko