In [1]:

    
%matplotlib inline
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import os

Linear Regression Overview

Linear Model. Estimated Target = w₀ + w₁x₁ + w₂x₂ + w₃x₃ + … + w_nx_n
where, w is the weight and x is the feature
Predicted Value: Numeric
Algorithm Used: Linear Regression. Objective is to find the weights w
Optimization: Stochastic Gradient Descent. Seeks to minimize loss/cost so that predicted value is as close to actual as possible
Cost/Loss Calculation: Squared loss function



In [2]:

    
def straight_line(x):
    return 5 * x + 8



In [3]:

    
def straight_line_weight(weight1, x):
    return weight1 * x + 8



In [4]:

    
np.random.seed(5)
x_vals = pd.Series(np.random.rand(150) * 20)
y_vals = x_vals.map(straight_line)



In [5]:

    
df = pd.DataFrame({'x1': x_vals,
                   'y': y_vals})



In [6]:

    
# One Feature example
# Training Set - Contains several examples of feature 'x' and corresponding correct answer 'y'
# Objective is to find out the form y = w0 + w1*x1
df.head()



In [7]:

    
df.tail()



In [8]:

    
fig = plt.figure(figsize = (12, 8))
plt.scatter(x = x_vals,
            y = y_vals)
plt.xlabel('Feature x1')
plt.ylabel('Target y')
plt.grid(True)
plt.title('Training Set - One Feature')









    Out[8]:





<matplotlib.text.Text at 0x1f4eda34c18>

Try with different values for W1

Assume that w0 = 8. We need to find out optimal value for w1. Let's try different weights and compute target attribute y



In [9]:

    
weights = [3, 4, 5, 6, 7]
y_at_weight = {}

for w1 in weights:    
    y_calculated = []
    y_at_weight[w1] = y_calculated
    for x in x_vals:
        y_calculated.append(straight_line_weight(w1,x))



In [10]:

    
fig = plt.figure(figsize = (12, 8))
plt.scatter(x = x_vals,
            y = y_vals, 
            label = 'actual')
plt.scatter(x = x_vals, y = y_at_weight[3], color = 'r', marker = '+', label = 'weight 3')
plt.scatter(x = x_vals, y = y_at_weight[4], color = 'g', label = 'weight 4')
plt.scatter(x = x_vals, y = y_at_weight[5], label = 'weight 5')
plt.scatter(x = x_vals, y = y_at_weight[6], color = 'y', label = 'weight 6')
plt.scatter(x = x_vals, y = y_at_weight[7], color = 'k', marker='+', label = 'weight 7')
plt.xlabel('Feature x1')
plt.ylabel('Predicted y')
plt.title('Predicted Output for different weights')
plt.grid(True)
plt.legend()









    Out[10]:





<matplotlib.legend.Legend at 0x1f4edbf02e8>

Plot Loss at different Weights w1



In [11]:

    
# For a set of weights, let's find out loss or cost
weight = pd.Series(np.linspace(3, 7, 100))



In [12]:

    
weight.head()









    Out[12]:





0    3.000000
1    3.040404
2    3.080808
3    3.121212
4    3.161616
dtype: float64



In [13]:

    
weight.tail()









    Out[13]:





95    6.838384
96    6.878788
97    6.919192
98    6.959596
99    7.000000
dtype: float64



In [14]:

    
# Cost/Loss Calculation: Squared loss function...a measure of how far is predicted value from actual
# Steps :
#  For every weight and feature x, compute predicted y
#  Now find out loss by = average ((predicted y - actual y)**2)
loss_at_wt = []
for w1 in weight:
    y_predicted = []
    for x in x_vals:
        y_predicted.append(straight_line_weight(w1, x))
    
    loss_at_wt.append(((y_vals - y_predicted) ** 2).mean())



In [15]:

    
fig = plt.figure(figsize = (12, 8))
plt.scatter(x = weight, 
            y = loss_at_wt)
plt.grid(True)
plt.xlabel('Weight for feature 1')
plt.ylabel('Loss')
plt.title('Loss Curve - Loss at different weight')









    Out[15]:





<matplotlib.text.Text at 0x1f4edcaa4e0>

Loss Function

Squared Loss Function. Loss is the average of squared difference between predicted and actual value. Squared Loss function not only gives us Loss at a given weight, it also tells us which direction to go to minimize loss.
For a given (weight, loss), algorithm finds the slope using calculus/first order derivatives. If negative slope, increase the weight If positive slope, decrease the weight

Learning Rate

Learning Rate decides how much the weight should be increased or decreased.
Too big of a change, it will skip the point where loss is minimal.
Too small of a change, it will take several iterations to find out where the loss is minimal.
In AWS ML, Learning Rate is automatically selected.

Quadratic Example with two features



In [16]:

    
# Let's look at a quadratic example: y = x**2 + x + c
# two features: x**2 and x



In [17]:

    
def quad_func (x):
    return 25 * x ** 2 -80 * x + 64



In [18]:

    
def quad_func_weight(weight2, x):
    #For different weights of quadratic term
    # Acutal eqn. 25x^2 - 80x + 64.  We have fixed w1=-80,w0=64. need to find w2.
    return weight2 * x ** 2 -80 * x + 64



In [19]:

    
# Quadratic
np.random.seed(5)
x_vals = pd.Series(np.random.rand(150) * 20)
y_vals = x_vals.map(quad_func)



In [20]:

    
plt.scatter(x = x_vals,
            y = y_vals)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Training Set - Two Features')
plt.grid(True)



In [21]:

    
weights = [0, 20, 30, 50]
y_at_weight = {}

for w1 in weights:    
    y_calculated = []
    y_at_weight[w1] = y_calculated
    
    for x in x_vals:
        y_calculated.append(quad_func_weight(w1, x))



In [22]:

    
fig = plt.figure(figsize = (12, 8))
plt.scatter(x = x_vals, y = y_vals, label = 'actual')
plt.scatter(x = x_vals, y = y_at_weight[0], label = 'weight 0', color = 'r')
plt.scatter(x = x_vals, y = y_at_weight[20], label = 'weight 20', color = 'g')
plt.scatter(x = x_vals, y = y_at_weight[30], label = 'weight 30', color = 'k')
plt.scatter(x = x_vals, y = y_at_weight[50], label = 'weight 50', color = 'y')
plt.xlabel('x')
plt.ylabel('Predicted y')
plt.title('Predicted Output for different weights')
plt.grid(True)
plt.legend()









    Out[22]:





<matplotlib.legend.Legend at 0x1f4ee3052e8>



In [23]:

    
# Initialize Weights for feature 2 
weight = pd.Series(np.linspace(-20, 70, 200))
loss_at_wt = []
for w1 in weight:
    y_calculated = []
    for x in x_vals:
        y_calculated.append(quad_func_weight(w1,x))
    
    loss_at_wt.append(((y_vals - y_calculated) ** 2).mean())



In [24]:

    
fig = plt.figure(figsize = (12, 8))
plt.scatter(x = weight, 
            y = loss_at_wt)
plt.grid(True)
plt.xlabel('Weight for feature 2')
plt.ylabel('Loss')
plt.title('Loss Curve - Loss at different weight')









    Out[24]:





<matplotlib.text.Text at 0x1f4ee58f7b8>

Summary

Squared Loss Function is parabolic in nature. It has an important property of not only telling us the loss at a given weight, but also tells us which way to go to minimize loss

Gradient Descent optimization alogrithm uses loss function to move the weights of all the features and iteratively adjusts the weights until optimal value is reached

Batch Gradient Descent predicts y value for all training examples and then adjusts the value of weights based on loss. It can converge much slower when training set is very large. Training set order does not matter as every single example in the training set is considered before making adjustments

Stochastic Gradient Descent predicts y value for next training example and immediately adjusts the value of weights.

It can converge faster when training set is very large. Training set should be random order otherwise model will not learn correctly. AWS ML uses Stochastic Gradient Descent

	x1	y
0	4.439863	30.199317
1	17.414646	95.073231
2	4.134383	28.671916
3	18.372218	99.861091
4	9.768224	56.841119

	x1	y
145	19.299933	104.499664
146	11.419504	65.097522
147	6.050362	38.251811
148	16.514117	90.570583
149	13.188346	73.941728