CSAL4243: Introduction to Machine Learning

Muhammad Mudassir Khan (mudasssir.khan@ucp.edu.pk)

Lecture 4: Linear Regression and Gradient Descent Example

Overview

Machine Learning pipeline
Linear Regression with one variable
- Model Representation
- Cost Function
Gradient Descent
Linear Regression Example
Resources
Credits

Machine Learning pipeline

x is called input variables or input features.
y is called output or target variable. Also sometimes known as label.
h is called hypothesis or model.
pair (x⁽ⁱ⁾,y⁽ⁱ⁾) is called a sample or training example
dataset of all training examples is called training set.
m is the number of samples in a dataset.
n is the number of features in a dataset excluding label.

Linear Regression with one variable

Model Representation

Model is represented by h_$\theta$(x) or simply h(x)
For Linear regression with one input variable h(x) = $\theta$₀ + $\theta$₁x

$\theta$₀ and $\theta$₁ are called weights or parameters.
Need to find $\theta$₀ and $\theta$₁ that maximizes the performance of model.

Cost Function

Let $\hat{y}$ = h(x) = $\theta$₀ + $\theta$₁x

Error in single sample (x,y) = $\hat{y}$ - y = h(x) - y

Cummulative error of all m samples = $\sum_{i=1}^{m} (h(x^i) - y^i)^2$

Finally mean squared error or cost function = J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$

Gradient Descent

Gradient descent equation:

$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1)$

Linear regression Cost function:

J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$

Replacing J($\theta$) in gradient descent equation:

\begin{align*} \text{repeat until convergence: } \lbrace & \newline \theta_0 := & \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x_{i}) - y_{i}) \newline \theta_1 := & \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x_{i}) - y_{i}) x_{i}\right) \newline \rbrace& \end{align*}

Linear Regression Example

x	y
1	0.8
2	1.6
3	2.4
4	3.2

Read data



In [31]:

    
%matplotlib inline
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt

# read data in pandas frame
dataframe = pd.read_csv('datasets/example1.csv', encoding='utf-8')

# assign x and y
X = np.array(dataframe[['x']])
y = np.array(dataframe[['y']])

m = y.size # number of training examples



In [32]:

    
# check data by printing first few rows
dataframe.head()

Plot data



In [33]:

    
#visualize results
plt.scatter(X, y)
plt.title("Dataset")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Find a line that best fit the data



In [34]:

    
#best fit line

tmpx = np.array([0, 1, 2, 3, 4])
y1 = 0.2*tmpx
y2 = 0.7*tmpx
y3 = 1.5*tmpx


plt.scatter(X, y)
plt.plot(tmpx,y1)
plt.plot(tmpx,y2)
plt.plot(tmpx,y3)
plt.title("Best fit line")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Lets assume $\theta_0 = 0$ and $\theta_1=0$

Model h(x) = $\theta_0$ + $\theta_1$x = 0

Cost function J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$ = $\frac{1}{2m}\sum_{i=1}^{m} (0 - y^i)^2$



In [35]:

    
theta0 = 0
theta1 = 0

cost = 0
for i in range(m):
        hx = theta1*X[i,0] + theta0
        cost += pow((hx - y[i,0]),2)

cost = cost/(2*m)             
print (cost)

2.4

plot it



In [36]:

    
# predict using model
y_pred = theta1*X + theta0

# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for theta1 = 0")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Plot $\theta1$ vs Cost



In [37]:

    
# save theta1 and cost in a vector
cost_log = []
theta1_log = []

cost_log.append(cost)
theta1_log.append(theta1)

# plot
plt.scatter(theta1_log, cost_log)
plt.title("Theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()

Lets assume $\theta_0 = 0$ and $\theta_1=1$

Model h(x) = $\theta_0$ + $\theta_1$x = x

Cost function J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$ = $\frac{1}{2m}\sum_{i=1}^{m} (x^i - y^i)^2$



In [38]:

    
theta0 = 0
theta1 = 1

cost = 0
for i in range(m):
        hx = theta1*X[i,0] + theta0
        cost += pow((hx - y[i,0]),2)

cost = cost/(2*m)             
print (cost)

plot it



In [39]:

    
# predict using model
y_pred = theta1*X + theta0

# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for theta1 = 1")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Plot $\theta1$ vs Cost again



In [40]:

    
# save theta1 and cost in a vector
cost_log.append(cost)
theta1_log.append(theta1)

# plot
plt.scatter(theta1_log, cost_log)
plt.title("Theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()

Lets assume $\theta_0 = 0$ and $\theta_1=2$

Model h(x) = $\theta_0$ + $\theta_1$x = 2x

Cost function J($\theta$) = $\frac{1}{2m}\sum_{i=1}^{m} (h(x^i) - y^i)^2$ = $\frac{1}{2m}\sum_{i=1}^{m} (2x^i - y^i)^2$



In [41]:

    
theta0 = 0
theta1 = 2

cost = 0
for i in range(m):
        hx = theta1*X[i,0] + theta0
        cost += pow((hx - y[i,0]),2)

cost = cost/(2*m)             
print (cost)


# predict using model
y_pred = theta1*X + theta0

# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for theta1 = 2")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

5.4



In [42]:

    
# save theta1 and cost in a vector
cost_log.append(cost)
theta1_log.append(theta1)

# plot
plt.scatter(theta1_log, cost_log)
plt.title("theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()

Run it for a while



In [43]:

    
theta0 = 0
theta1 = -3.1

cost_log = []
theta1_log = [];

inc = 0.1
for j in range(61):
    theta1 = theta1 + inc;
    
    cost = 0
    for i in range(m):
        hx = theta1*X[i,0] + theta0
        cost += pow((hx - y[i,0]),2)

    cost = cost/(2*m)             

    cost_log.append(cost)
    theta1_log.append(theta1)

plot $\theta_1$ vs Cost



In [44]:

    
plt.scatter(theta1_log, cost_log)
plt.title("theta1 vs Cost")
plt.xlabel("Theta1")
plt.ylabel("Cost")
plt.show()

Lets do it with Gradient Descent now



In [45]:

    
theta0 = 0
theta1 = -3

alpha = 0.1
interations = 100

cost_log = []
iter_log = [];

inc = 0.1
for j in range(interations):
    
    cost = 0
    grad = 0
    for i in range(m):
        hx = theta1*X[i,0] + theta0      
        cost +=  pow((hx - y[i,0]),2)
        grad += ((hx - y[i,0]))*X[i,0]

    cost = cost/(2*m)
    grad = grad/(2*m)             
    theta1 = theta1 - alpha*grad
    
    cost_log.append(cost)



In [46]:

    
theta1









    Out[46]:





0.79999999999999993

Plot Convergence



In [47]:

    
plt.plot(cost_log)
plt.title("Convergence of Cost Function")
plt.xlabel("Iteration number")
plt.ylabel("Cost function")
plt.show()

Predict output using trained model



In [48]:

    
# predict using model
y_pred = theta1*X + theta0

# plot
plt.scatter(X, y)
plt.plot(X, y_pred)
plt.title("Line for Theta1 from Gradient Descent")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Resources

Course website: https://w4zir.github.io/ml17s/

Course resources

Credits

Raschka, Sebastian. Python machine learning. Birmingham, UK: Packt Publishing, 2015. Print.

Andrew Ng, Machine Learning, Coursera

Lucas Shen

David Kaleko