CSAL4243: Introduction to Machine Learning

Muhammad Mudassir Khan (mudasssir.khan@ucp.edu.pk)

Lecture 4: Linear Regression and Gradient Descent Example

Overview

Machine Learning pipeline
Linear Regression with one variable
- Model Representation
- Cost Function
Gradient Descent
Linear Regression Example
Resources
Credits

Machine Learning pipeline

x is called input variables or input features.
y is called output or target variable. Also sometimes known as label.
h is called hypothesis or model.
pair (x⁽ⁱ⁾,y⁽ⁱ⁾) is called a sample or training example
dataset of all training examples is called training set.
m is the number of samples in a dataset.
n is the number of features in a dataset excluding label.

Linear Regression with one variable

Model Representation

Model is represented by h_$\theta$(x) or simply h(x)
For Linear regression with one input variable h(x) = $\theta$₀ + $\theta$₁x

$\theta$₀ and $\theta$₁ are called weights or parameters.
Need to find $\theta$₀ and $\theta$₁ that maximizes the performance of model.

Cost Function

Let $\hat{y}$ = h(x) = $\theta$₀ + $\theta$₁x

Error in single sample (x,y) = $\hat{y}$ - y = h(x) - y

Cummulative error of all m samples = $\sum_{i=1}^{m} (h(x^i) - y^i)^2$

Finally mean squared error or cost function = J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$

Gradient Descent

Gradient descent equation:

$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1)$

Linear regression Cost function:

J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$

Replacing J($\theta$) in gradient descent equation:

\begin{align*} \text{repeat until convergence: } \lbrace & \newline \theta_0 := & \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x_{i}) - y_{i}) \newline \theta_1 := & \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x_{i}) - y_{i}) x_{i}\right) \newline \rbrace& \end{align*}

Linear Regression Example

x	y
1	0.8
2	1.6
3	2.4
4	3.2

Read data



In [49]:

    
%matplotlib inline
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt

# read data in pandas frame
dataframe = pd.read_csv('datasets/example1.csv', encoding='utf-8')

# assign x and y
X = np.array(dataframe[['x']])
y = np.array(dataframe[['y']])

m = y.size # number of training examples



In [32]:

    
# check data by printing first few rows
dataframe.head()

Plot data



In [33]:

    
#visualize results
plt.scatter(X, y)
plt.title("Dataset")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Find a line that best fit the data



In [34]:

    
#best fit line

tmpx = np.array([0, 1, 2, 3, 4])
y1 = 0.2*tmpx
y2 = 0.7*tmpx
y3 = 1.5*tmpx


plt.scatter(X, y)
plt.plot(tmpx,y1)
plt.plot(tmpx,y2)
plt.plot(tmpx,y3)
plt.title("Best fit line")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Lets assume $\theta_0 = 0$ and $\theta_1=0$

Model h(x) = $\theta_0$ + $\theta_1$x = 0

Cost function J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$ = $\frac{1}{2m}\sum_{i=1}^{m} (0 - y^i)^2$



In [35]:

    
theta0 = 0
theta1 = 0

cost = 0
for i in range(m):
        hx = theta1*X[i,0] + theta0
        cost += pow((hx - y[i,0]),2)

cost = cost/(2*m)             
print (cost)

2.4

plot it



In [36]:

    
# predict using model
y_pred = theta1*X + theta0

# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for theta1 = 0")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Plot $\theta1$ vs Cost



In [37]:

    
# save theta1 and cost in a vector
cost_log = []
theta1_log = []

cost_log.append(cost)
theta1_log.append(theta1)

# plot
plt.scatter(theta1_log, cost_log)
plt.title("Theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()

Lets assume $\theta_0 = 0$ and $\theta_1=1$

Model h(x) = $\theta_0$ + $\theta_1$x = x

Cost function J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$ = $\frac{1}{2m}\sum_{i=1}^{m} (x^i - y^i)^2$



In [38]:

    
theta0 = 0
theta1 = 1

cost = 0
for i in range(m):
        hx = theta1*X[i,0] + theta0
        cost += pow((hx - y[i,0]),2)

cost = cost/(2*m)             
print (cost)

plot it



In [39]:

    
# predict using model
y_pred = theta1*X + theta0

# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for theta1 = 1")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Plot $\theta1$ vs Cost again



In [40]:

    
# save theta1 and cost in a vector
cost_log.append(cost)
theta1_log.append(theta1)

# plot
plt.scatter(theta1_log, cost_log)
plt.title("Theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()

Lets assume $\theta_0 = 0$ and $\theta_1=2$

Model h(x) = $\theta_0$ + $\theta_1$x = 2x

Cost function J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$ = $\frac{1}{2m}\sum_{i=1}^{m} (2x^i - y^i)^2$



In [41]:

    
theta0 = 0
theta1 = 2

cost = 0
for i in range(m):
        hx = theta1*X[i,0] + theta0
        cost += pow((hx - y[i,0]),2)

cost = cost/(2*m)             
print (cost)


# predict using model
y_pred = theta1*X + theta0

# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for theta1 = 2")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

5.4



In [42]:

    
# save theta1 and cost in a vector
cost_log.append(cost)
theta1_log.append(theta1)

# plot
plt.scatter(theta1_log, cost_log)
plt.title("theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()

Run it for a while



In [43]:

    
theta0 = 0
theta1 = -3.1

cost_log = []
theta1_log = [];

inc = 0.1
for j in range(61):
    theta1 = theta1 + inc;
    
    cost = 0
    for i in range(m):
        hx = theta1*X[i,0] + theta0
        cost += pow((hx - y[i,0]),2)

    cost = cost/(2*m)             

    cost_log.append(cost)
    theta1_log.append(theta1)

plot $\theta_1$ vs Cost



In [44]:

    
plt.scatter(theta1_log, cost_log)
plt.title("theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()

Lets do it with Gradient Descent now



In [60]:

    
theta0 = 0
theta1 = 2

alpha = 0.1
interations = 100

cost_log = []
theta_log = [];

inc = 0.1
for j in range(interations):
    
    cost = 0
    grad = 0
    for i in range(m):
        hx = theta1*X[i,0] + theta0      
        cost +=  pow((hx - y[i,0]),2)
        grad += ((hx - y[i,0]))*X[i,0]

    cost = cost/(2*m)
    grad = grad/(2*m)             
    theta1 = theta1 - alpha*grad
    
    
    cost_log.append(cost)
    theta_log.append(theta1)



In [61]:

    
theta_log









    Out[61]:





[1.55,
 1.26875,
 1.09296875,
 0.98310546875000004,
 0.91444091796875004,
 0.87152557373046879,
 0.84470348358154301,
 0.8279396772384644,
 0.81746229827404027,
 0.81091393642127518,
 0.80682121026329701,
 0.80426325641456065,
 0.80266453525910042,
 0.80166533453693778,
 0.80104083408558613,
 0.80065052130349135,
 0.80040657581468211,
 0.80025410988417633,
 0.80015881867761018,
 0.80009926167350642,
 0.80006203854594149,
 0.80003877409121349,
 0.80002423380700849,
 0.80001514612938029,
 0.80000946633086267,
 0.8000059164567892,
 0.8000036977854933,
 0.80000231111593334,
 0.80000144444745835,
 0.80000090277966152,
 0.80000056423728849,
 0.80000035264830527,
 0.80000022040519081,
 0.8000001377532443,
 0.80000008609577766,
 0.80000005380986106,
 0.80000003363116323,
 0.80000002101947709,
 0.80000001313717317,
 0.80000000821073325,
 0.80000000513170832,
 0.80000000320731768,
 0.80000000200457355,
 0.80000000125285853,
 0.80000000078303657,
 0.8000000004893979,
 0.8000000003058737,
 0.80000000019117112,
 0.80000000011948202,
 0.80000000007467631,
 0.80000000004667271,
 0.80000000002917049,
 0.80000000001823157,
 0.80000000001139471,
 0.80000000000712168,
 0.80000000000445104,
 0.80000000000278193,
 0.80000000000173876,
 0.80000000000108673,
 0.80000000000067928,
 0.80000000000042459,
 0.80000000000026539,
 0.80000000000016591,
 0.80000000000010374,
 0.80000000000006488,
 0.80000000000004057,
 0.80000000000002536,
 0.80000000000001581,
 0.80000000000000993,
 0.80000000000000626,
 0.80000000000000393,
 0.80000000000000249,
 0.8000000000000016,
 0.80000000000000104,
 0.80000000000000071,
 0.80000000000000049,
 0.80000000000000027,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016,
 0.80000000000000016]

Plot Convergence



In [59]:

    
plt.plot(cost_log)
plt.title("Convergence of Cost Function")
plt.xlabel("Iteration number")
plt.ylabel("Cost function")
plt.show()

Predict output using trained model



In [48]:

    
# predict using model
y_pred = theta1*X + theta0

# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for Theta1 from Gradient Descent")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Resources

Course website: https://w4zir.github.io/ml17s/

Course resources

Credits

Raschka, Sebastian. Python machine learning. Birmingham, UK: Packt Publishing, 2015. Print.

Andrew Ng, Machine Learning, Coursera

Lucas Shen

David Kaleko