In [1]:
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
%matplotlib inline

Preparing training data


In [2]:
# X = np.mat("[2 3;1 3;5 9; 12 21;15 27;20 35;22 40]")
# Y = np.mat("[4 3 8 17 27 35 40]")
# X, Y
df = pd.read_csv("data/cars.csv", sep=";")

def x_normalization(x):
    return x - 1960

def y_normalization(x):
    return x / 1000

X = x_normalization(np.matrix(df.Year[0:40]).T)
Y = y_normalization(np.matrix(df.Bus[0:40]).T)

df[45:]


Out[2]:
Year All Car Minibus Bus SmallTruck Truck Motorcycle SpecialVehicles Machinery Tractor
45 2011 16089528 8113111 389435 219906 2611104 728458 2527190 34116 NaN 1466208.0
46 2012 17033413 8648875 396119 235949 2794606 751650 2657722 33071 NaN 1515421.0
47 2013 17939447 9283923 421848 219885 2933050 755950 2722826 36148 NaN 1565817.0
48 2014 18828721 9857915 427264 211200 3062479 773728 2828466 40731 NaN 1626938.0
49 2015 19882069 10509258 446822 216566 3235304 802615 2938821 45138 NaN 1687545.0

Plotting the training data


In [3]:
plt.plot(X+1960, Y, "bo")
plt.grid(True)


Learning from the trainin data, creating a model

So far we have prepared necessary data and gain some insight about it. Now we can start to work on the essential problem: creating models and making predictions using it.

Lineare regresstion models can be made in two ways: one of them is to use normal equation form, the other is to use gradient descent as an iterative approach.


In [4]:
def learn_normal_equation(x, y):
    """
    creates a model and returns it.
    """
    x = np.hstack([np.matrix(np.ones(len(X))).T, x])
    w = (x.T*x).I * x.T * y
    return w

In [5]:
model = learn_normal_equation(X, Y)

Plotting the best fit curve/model using the training data


In [6]:
plt.plot(X, Y, "bo")
plt.plot(X, np.hstack([np.matrix(np.ones(len(X))).T, X]) * model, "r-")
plt.grid(True)
#plt.xlim(0, X[:,0][-1]+2)


Testing model with test data


In [7]:
TX = np.matrix(df.Year[40:50]).T - 1960
TY = np.matrix(df.Bus[40:50]).T / 1000. # normalizad in million

In [8]:
plt.plot(X, Y, "bo")
plt.plot(TX, TY, "go")
plt.plot(TX, np.hstack([np.matrix(np.ones(len(TX))).T, TX]) * model, "r-")
plt.plot(X, np.hstack([np.matrix(np.ones(len(X))).T, X]) * model, "r-")


Out[8]:
[<matplotlib.lines.Line2D at 0x10a866ba8>]

Linear Regression using gradient descent

Instead of using normal equation, here we will try to fit a model in an iterative way. Gradient descent is the method we are going to implement as an iterative approach.

We use the same data prepared above.

Plotting cost function

Cost funtion is the function that calculates the sum of squared error according a specific weight value. To plot the cost function, start with a random weight value and calculate the sum of error and repeat this operation by increasing or descreasing the weight value until some arbitrary weight value.


In [9]:
def plot_cost_function(x, y):
    N = len(x)
    history = []
    for i in range(-100, 100):
        w = i
        sum_error = np.sum(np.power(x * i - y, 2)) / N
        history.append([w, sum_error])

    history = np.array(history)
    plt.plot(history[:,0], history[:,1], "r")
    plt.xlabel("weight value")
    plt.ylabel("sum of error")
    plt.show()

plot_cost_function(X, Y)


As it can be seen, at a particular weight value the sum of error approaches to the global minimum, that is, the sum of error gets its lowest value as possible. Note that this function has only one minimum point which is also its global minimum. Our aim is to find the weight value that gives the lowest cost.

Finding the weight by gradient descent


In [29]:
def learn_gradient_descent(x, y):
    N = len(x)
    dimention = x.shape[1]
    
    w = np.ones((dimention, 1))
    eta = 0.0001

    i = 0
    max_iteration = 1000000
    prev_error = np.infty
    error = 0
    history = []
    
    while True:
        estimation = x * w
        error = np.sum(np.power(estimation - y, 2)) / N
        gradient = x.T * (estimation - y) / N
        w = w - eta * gradient
        
        history.append([i, error])
        
        if prev_error * 0.9999 <= error or i == max_iteration:
            break

        prev_error = error
        i += 1
        
    return w, np.matrix(history)

Debugging gradient descent

It is expected that the sum of error will decrease in each iterations. Decrement size will get smaller when it approaches to the minimum error. With the eta variable, we define how big step we will take in each iteration to the minimum error.

Debugging is important for gradient descent algorithms to check if it gets closer to the global minimum. For some reasons, for instance due to inappropriate eta value, gradient descent algorithms may be too slow; or end up with divergence instead of convergence.

Plotting error for each iterations may help you with debugging.


In [31]:
x = np.hstack([np.ones((len(X), 1)), X])
w, history = learn_gradient_descent(x, Y)

def plot_error(iterations, errors):
    plt.plot(iterations, errors)
    plt.xlabel("iterations")
    plt.ylabel("error")
    plt.show()

plot_error(history[:,0], history[:,1])


Making predictions

Let's first plot the model on the same figure of the actual data. On this figure, we will visually see how the model fitted.


In [14]:
m = np.matrix(np.linspace(-4, 50, 300)).T
m = np.hstack([np.matrix(np.ones(len(m))).T, m])
plt.plot(x[:,1], Y, "o")
plt.plot(m[:,1], m * w, "r-")
plt.grid(True)
plt.show()



In [15]:
def predict(x, w):
    x = np.hstack([np.matrix(np.ones((len(x), 1))), x])
    return x * w

print(predict(x_normalization(np.mat(1980)), w))
print(predict(x_normalization(np.mat(2020)), w))


[[ 51.03482326]]
[[ 151.06806996]]

In [16]:
X_test = x_normalization(np.matrix(df.Year[40:]).T)
Y_test = y_normalization(np.matrix(df.Bus[40:]).T)
Y_predicted = predict(X_test, w)

m = np.matrix(np.linspace(-4, 60, 300)).T
m = np.hstack([np.matrix(np.ones(len(m))).T, m])
plt.plot(x[:,1], Y, "o")
plt.plot(m[:,1], m * w, "r-")
plt.plot(X_test, Y_test, "go")
plt.grid(True)
plt.show()