Leren: Programming assignment 3

This assignment can be done in teams of 2

Student 1: Roan de Jong (10791930)
Student 2: Ghislaine van den Boogerd (student_id)

This notebook provides a template for your programming assignment 3. You may want to use parts of your code from the previous assignment(s) as a starting point for this assignment.

The code you hand-in should follow the structure from this document. Each part of the assignment has its own cell, you are free to add more cells. Note that the structure corresponds with the structure from the actual programming assignment. Make sure you read this for the full explanation of what is expected of you.

Submission:

Make sure your code can be run from top to bottom without errors.
Include your data files in the zip file.
Comment your code

One way be sure you code can be run without errors is by quiting iPython completely and then restart iPython and run all cells again (you can do this by going to the menu bar above: Cell > Run all). This way you make sure that no old definitions of functions or values of variables are left (that your program might still be using).

If you have any questions ask your teaching assistent. We are here for you.

Regularized Logistic Regression

a) Implementation



In [266]:

    
from __future__ import division
import numpy as np
import pandas as pd
import csv
import math

class logReg:

    df = None
    input_vars = None
    classifying_vars = None
    thetas = None
    alpha = 0.0
    reg_lambda = 0.0

    def __init__(self, fileName, alpha, reg_lambda):
        self.df = pd.read_csv(fileName, header=None)
        length_col = len(self.df[self.df.columns[-1]])
        self.classifying_vars = self.df[self.df.columns[-1]].as_matrix()\
                                                            .reshape(length_col, 1)
        x = self.df[self.df.columns[0:-1]].as_matrix()
        # this is the column for x_0
        temp_arr = np.ones((1, len(x.T[0])))
        for column in x.T:
            if column.max(0) > 0:
                column = column / column.max(0)
            temp_arr = np.vstack((temp_arr, column))
        self.input_vars = temp_arr.T
        self.thetas = np.full((len(self.input_vars[0]), 1), 0.5)
        self.alpha = alpha
        self.reg_lambda = reg_lambda

    @property
    def gradient(self):
        theta_x = np.dot(self.input_vars, self.thetas)
        # An ugly way to make a np.array
        h_x = np.array([0.0])
        for example in theta_x:
            h_x = np.vstack((h_x, 1 / (1 + math.e**(-example))))
        # We added this range to get rid of the useless 1st index: 0.0
        return h_x[1:]

    # Update the theta's as described in the lecture notes
    def update(self, classifier):
        output_vars = self.classifying_vars
        np.place(output_vars, output_vars != classifier, [0])
        np.place(output_vars, output_vars == classifier, [1])
        x = self.gradient - output_vars
        y = np.dot(self.input_vars.T, x)
        self.thetas = self.thetas - self.alpha * y + ((1/self.reg_lambda)*self.thetas)
        return self.thetas

    # calculate the cost
    def cost(self, classifier):
        h_x = self.gradient
        cost = 0.0
        for training_example in zip(h_x, self.classifying_vars):
            if training_example[1] == classifier:
                cost = cost + math.log(training_example[0])
            else:
                cost = cost + math.log(1 - training_example[0])
        cost = -(1/len(self.classifying_vars)) * cost + (self.reg_lambda/(2*len(self.classifying_vars))* self.thetas.T.dot(self.thetas))
        return cost

    # train the model on a certain number
    def train(self, classifier, iterations):
        for i in range(0, iterations):
            self.update(classifier)
            print(self.cost(classifier))

if __name__ == '__main__':
    trainer = logReg('digits123.csv', 0.00002, 200)
    trainer.train(3, 100)









    



[[ 13.30105576]]
[[ 13.20871212]]
[[ 13.11617917]]
[[ 13.02345879]]
[[ 12.9305529]]
[[ 12.83746348]]
[[ 12.74419253]]
[[ 12.65074213]]
[[ 12.55711439]]
[[ 12.46331149]]
[[ 12.36933567]]
[[ 12.2751892]]
[[ 12.18087444]]
[[ 12.08639382]]
[[ 11.9917498]]
[[ 11.89694494]]
[[ 11.80198187]]
[[ 11.70686329]]
[[ 11.61159198]]
[[ 11.51617082]]
[[ 11.42060277]]
[[ 11.32489088]]
[[ 11.22903832]]
[[ 11.13304837]]
[[ 11.0369244]]
[[ 10.94066993]]
[[ 10.84428862]]
[[ 10.74778425]]
[[ 10.65116078]]
[[ 10.55442232]]
[[ 10.45757318]]
[[ 10.36061785]]
[[ 10.26356105]]
[[ 10.16640773]]
[[ 10.06916307]]
[[ 9.97183257]]
[[ 9.87442201]]
[[ 9.77693751]]
[[ 9.67938554]]
[[ 9.581773]]
[[ 9.48410722]]
[[ 9.386396]]
[[ 9.28864771]]
[[ 9.19087129]]
[[ 9.09307635]]
[[ 8.9952732]]
[[ 8.89747299]]
[[ 8.79968772]]
[[ 8.7019304]]
[[ 8.60421512]]
[[ 8.50655716]]
[[ 8.40897316]]
[[ 8.31148122]]
[[ 8.21410108]]
[[ 8.11685431]]
[[ 8.01976445]]
[[ 7.92285726]]
[[ 7.82616096]]
[[ 7.72970642]]
[[ 7.63352747]]
[[ 7.53766118]]
[[ 7.44214815]]
[[ 7.34703283]]
[[ 7.25236386]]
[[ 7.15819442]]
[[ 7.06458256]]
[[ 6.97159158]]
[[ 6.87929032]]
[[ 6.78775352]]
[[ 6.69706207]]
[[ 6.60730329]]
[[ 6.51857104]]
[[ 6.43096585]]
[[ 6.34459487]]
[[ 6.25957174]]
[[ 6.17601628]]
[[ 6.09405399]]
[[ 6.01381544]]
[[ 5.93543537]]
[[ 5.85905164]]
[[ 5.78480393]]
[[ 5.71283229]]
[[ 5.64327546]]
[[ 5.57626909]]
[[ 5.51194386]]
[[ 5.45042349]]
[[ 5.39182291]]
[[ 5.33624635]]
[[ 5.2837857]]
[[ 5.23451908]]
[[ 5.18850963]]
[[ 5.14580467]]
[[ 5.10643525]]
[[ 5.07041599]]
[[ 5.03774535]]
[[ 5.00840623]]
[[ 4.98236685]]
[[ 4.95958193]]
[[ 4.93999407]]
[[ 4.92353522]]

b) Two small datasets



In [ ]:

Discussion:

[You discussion comes here]

2) Neural Network

a) Forward Propagation



In [267]:

    
from __future__ import division
import numpy as np
import pandas as pd
import csv
import math


class neuralNet:
    
    class logReg:

        input_vars = None
        output_vars = None
        thetas = None
        reg_lambda = 0.0

        def __init__(self, thetas):
            self.thetas = thetas

        @property
        def gradient(self):
            theta_x = np.dot(self.input_vars, self.thetas)
            # An ugly way to make a np.array
            h_x = np.array([0.0])
            for example in theta_x:
                h_x = np.vstack((h_x, 1 / (1 + math.e**(-example))))
            # We added this range to get rid of the useless 1st index: 0.0
            self.output_vars = h_x[1:]

        # Update the theta's as described in the lecture notes
        def update(self, classifier):
            y_vars = self.output_vars
            np.place(y_vars, y_vars != classifier, [0])
            np.place(y_vars, y_vars == classifier, [1])
            x = self.gradient - y
            y = np.dot(self.input_vars.T, x)
            self.thetas = self.thetas - self.alpha * y
            return self.thetas

    df = None
    input_vars = None
    classifying_vars = None
    alpha = 0.0
    architecture = None
    activations = None
    
    def __init__(self, fileName, alpha, architecture):
        self.read_data(fileName, alpha)
        self.create_architecture(architecture)
        
    def read_data(self, fileName, alpha):
        #self.df = pd.read_csv(fileName, header=None)
        #length_col = len(self.df[self.df.columns[-1]])
        #self.classifying_vars = self.df[self.df.columns[-1]].as_matrix()\
        #                                                    .reshape(length_col, 1)
        #x = self.df[self.df.columns[0:-1]].as_matrix()
        # this is the column for x_0
        x = np.array([[-5.0]])
        #temp_arr = np.ones((1, len(x.T[0])))
        #for column in x.T:
        #    if column.max(0) > 0:
        #        column = column / column.max(0)
        #    temp_arr = np.vstack((temp_arr, column))
        #self.input_vars = temp_arr.T
        self.input_vars = x
        self.alpha = alpha
        
    def create_architecture(self, nn_architecture):
        architecture = []
        input_layer_size = len(self.input_vars[0])
        initial_layer = []
        for node in range(0, nn_architecture[0]):
            thetas = np.array([[0.2]])
            #thetas = np.random.rand(input_layer_size, 1)
            agent = self.logReg(thetas)
            agent.input_vars = self.input_vars
            initial_layer.append(agent)
        architecture.append(initial_layer)
        for layer_size in nn_architecture[1:]:
            layer = []
            for node in range(0, layer_size):
                thetas = np.array([[0.1]])
                #thetas = np.random.rand(len(architecture[-1]), 1)
                agent = self.logReg(thetas)
                layer.append(agent)
            architecture.append(layer)
        self.architecture = architecture

    def forward_prop(self):
        activations = []
        for layer in self.architecture:
            if activations:
                activation_layer = np.zeros((len(activations[0]), 1))
            else:
                activation_layer = np.zeros((len(layer[0].input_vars), 1))
            for node in layer:
                if activations:
                    node.input_vars = activations[-1]
                node.gradient
                activation_layer = np.hstack((activation_layer, node.output_vars))
            activations.append(activation_layer[:, 1:])
        print(activations)
        self.activations = activations
        
if __name__ == "__main__":
    nn = neuralNet('ez_test.csv', 0.01, [1, 1])
    nn.forward_prop()









    



[array([[ 0.26894142]]), array([[ 0.50672313]])]

b) Backpropagation on two logistic units



In [268]:

    
from __future__ import division
import numpy as np
import pandas as pd
import csv
import math


class neuralNet:
    
    class logReg:

        input_vars = None
        output_vars = None
        thetas = None
        reg_lambda = 0.0

        def __init__(self, thetas):
            self.thetas = thetas

        @property
        def gradient(self):
            theta_x = np.dot(self.input_vars, self.thetas)
            # An ugly way to make a np.array
            h_x = np.array([0.0])
            for example in theta_x:
                h_x = np.vstack((h_x, 1 / (1 + math.e**(-example))))
            # We added this range to get rid of the useless 1st index: 0.0
            self.output_vars = h_x[1:]

        # Update the theta's as described in the lecture notes
        def update(self, classifier):
            y_vars = self.output_vars
            np.place(y_vars, y_vars != classifier, [0])
            np.place(y_vars, y_vars == classifier, [1])
            x = self.gradient - y
            y = np.dot(self.input_vars.T, x)
            self.thetas = self.thetas - self.alpha * y
            return self.thetas

    df = None
    input_vars = None
    classifying_vars = None
    alpha = 0.0
    architecture = None
    activations = None
    
    def __init__(self, alpha, architecture):
        self.read_data(alpha)
        self.create_architecture(architecture)
        
    def read_data(self, alpha):
        x = np.array([[-5.0]])
        self.input_vars = x
        self.classifying_vars = np.array([[1]])
        self.alpha = alpha
        
    def create_architecture(self, nn_architecture):
        architecture = []
        input_layer_size = len(self.input_vars[0])
        initial_layer = []
        for node in range(0, nn_architecture[0]):
            thetas = np.array([[0.5]])
            agent = self.logReg(thetas)
            agent.input_vars = self.input_vars
            initial_layer.append(agent)
        architecture.append(initial_layer)
        for layer_size in nn_architecture[1:]:
            layer = []
            for node in range(0, layer_size):
                thetas = np.array([[0.5]])
                agent = self.logReg(thetas)
                layer.append(agent)
            architecture.append(layer)
        self.architecture = architecture

    def forward_prop(self):
        activations = []
        for layer in self.architecture:
            if activations:
                activation_layer = np.zeros((len(activations[0]), 1))
            else:
                activation_layer = np.zeros((len(layer[0].input_vars), 1))
            for node in layer:
                if activations:
                    node.input_vars = activations[-1]
                node.gradient
                activation_layer = np.hstack((activation_layer, node.output_vars))
            activations.append(activation_layer[:, 1:])
        self.activations = activations
        
    def back_prop(self, classifier):
        errors = []
        for backprop in zip(reversed(self.architecture), reversed(self.activations)):
            if errors:
                thetas = np.zeros((len(backprop[0][0].thetas), 1))
                for node in backprop[0]:
                    thetas = np.hstack((thetas, node.thetas))
                thetas = thetas[:, 1:]
                delta = backprop[1] * (1- backprop[1])
                delta = delta.T * np.dot(errors[-1], thetas)
                errors.append(delta)
            else:
                y_vars = self.classifying_vars
                np.place(y_vars, y_vars != classifier, [0])
                np.place(y_vars, y_vars == classifier, [1])
                delta = (backprop[1] - y_vars).T
                errors.append(delta)
        self.update(reversed(errors))
        
    def update(self, errors):
        for layer in zip(reversed(self.architecture), reversed(self.activations), errors):
            for node in layer[0]:
                node.thetas = node.thetas - self.alpha * (layer[1] * layer[2])

if __name__ == "__main__":
    nn = neuralNet(0.001, [1, 1])
    for i in range(0, 100):
        nn.forward_prop()
        nn.back_prop(1)

c) Complete backpropagation on handwritten digit recognition



In [269]:

    
from __future__ import division
import numpy as np
import pandas as pd
import csv
import math


class neuralNet:
    
    class logReg:

        input_vars = None
        output_vars = None
        thetas = None
        reg_lambda = 0.0

        def __init__(self, thetas):
            self.thetas = thetas

        @property
        def gradient(self):
            theta_x = np.dot(self.input_vars, self.thetas)
            # An ugly way to make a np.array
            h_x = np.array([0.0])
            for example in theta_x:
                h_x = np.vstack((h_x, 1 / (1 + math.e**(-example))))
            # We added this range to get rid of the useless 1st index: 0.0
            self.output_vars = h_x[1:]

    df = None
    input_vars = None
    classifying_vars = None
    alpha = 0.0
    architecture = None
    activations = None
    
    def __init__(self, fileName, alpha, architecture):
        self.read_data(fileName, alpha)
        self.create_architecture(architecture)
        
    def read_data(self, fileName, alpha):
        self.df = pd.read_csv(fileName, header=None)
        length_col = len(self.df[self.df.columns[-1]])
        self.classifying_vars = self.df[self.df.columns[-1]].as_matrix()\
                                                            .reshape(length_col, 1)
        x = self.df[self.df.columns[0:-1]].as_matrix()
        # this is the column for x_0
        temp_arr = np.ones((1, len(x.T[0])))
        for column in x.T:
            if column.max(0) > 0:
                column = column / column.max(0)
            temp_arr = np.vstack((temp_arr, column))
        self.input_vars = temp_arr[1:].T
        self.alpha = alpha
        
    def create_architecture(self, nn_architecture):
        architecture = []
        input_layer_size = len(self.input_vars[0])
        initial_layer = []
        for node in range(0, nn_architecture[0]):
            thetas = np.random.rand(input_layer_size, 1)
            agent = self.logReg(thetas)
            agent.input_vars = self.input_vars
            initial_layer.append(agent)
        architecture.append(initial_layer)
        for layer_size in nn_architecture[1:]:
            layer = []
            for node in range(0, layer_size):
                thetas = np.random.rand(len(architecture[-1]), 1)
                agent = self.logReg(thetas)
                layer.append(agent)
            architecture.append(layer)
        self.architecture = architecture
        self.architecture

    def forward_prop(self):
        activations = []
        for layer in self.architecture:
            if activations:
                activation_layer = np.zeros((len(activations[0]), 1))
            else:
                activation_layer = np.zeros((len(layer[0].input_vars), 1))
            for node in layer:
                if activations:
                    node.input_vars = activations[-1]
                node.gradient
                activation_layer = np.hstack((activation_layer, node.output_vars))
            activations.append(activation_layer[:, 1:])
        self.activations = activations
        
    def back_prop(self, classifier):
        errors = []
        for backprop in zip(reversed(self.architecture), reversed(self.activations)):
            if errors:
                thetas = np.zeros((len(backprop[0][0].thetas), 1))
                for node in backprop[0]:
                    thetas = np.hstack((thetas, node.thetas))
                thetas = thetas[:, 1:]
                # I almost figured out how to perform the operations using Linear Algebra. 
                # However, my implementation does not work when the dimensions of the inputs of a layer differs from its output
                # According to documentation I found, this should work however 
                # (http://briandolhansky.com/blog/2014/10/30/artificial-neural-networks-matrix-form-part-5)
                print thetas.shape
                print errors[-1].shape
                print backprop[1].shape
                delta = np.multiply(backprop[1], (1- backprop[1]))         
                delta = np.multiply(delta, np.dot(thetas, errors[-1].T).T)
                errors.append(delta)
            else:
                y_vars = self.classifying_vars
                np.place(y_vars, y_vars != classifier, [0])
                np.place(y_vars, y_vars == classifier, [1])
                delta = (backprop[1] - y_vars)
                errors.append(delta)

if __name__ == "__main__":
    nn = neuralNet('digits123.csv', 0.01, [9, 9, 9])
    nn.forward_prop()
    nn.back_prop(1)









    



(9, 9)
(542, 9)
(542, 9)
(64, 9)
(542, 9)
(542, 9)






    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-269-8cf34cffae53> in <module>()
    116     nn = neuralNet('digits123.csv', 0.01, [9, 9, 9])
    117     nn.forward_prop()
--> 118     nn.back_prop(1)

<ipython-input-269-8cf34cffae53> in back_prop(self, classifier)
    104                 print backprop[1].shape
    105                 delta = np.multiply(backprop[1], (1- backprop[1]))
--> 106                 delta = np.multiply(delta, np.dot(thetas, errors[-1].T).T)
    107                 errors.append(delta)
    108             else:

ValueError: operands could not be broadcast together with shapes (542,9) (542,64)

Discussion:

[You discussion comes here]



In [ ]: