Applications of Machine Learning with Artificial Neural Networks and Supervised Regression


Lucas Barbosa

The Architecture

The structure of the ANN is going to be comprised of three layers:


The input and output layers will have 2 neurons and 1 neuron respectively, this is dependant on the dimensionally of the data. For the sake of simplicity the hidden layer will consists of only three hidden units (hidden neurons). When defining the structure of the ANN it all comes down to the quantity of:

Hyper-parameters
Parameters
Are fine-tuned while ANN is being trained to minimise the error of the predictions. Stay static the entire time unless changed or coded manually.

Examples of hyper-parameters are the amount of neurons in each layer and even the amount of layers themselves. These should be determined before approaching implementation in code. Parameters however such as weights and regularisation values are changed throughout the life span of the ANN.

Forward Propagation

Forward propagation is the process of advancing our input data (hours studied and slept) through the actual network. It’ll be useful to both Mathematically and Programatically visualise the ANN process until we converge to our final output result.

The only libraries being used are numpy for matrix math and scipy for Mathematical Optimisation. Before anything goes anywhere the data required for the network to be trained comes in twofold. Training data and testing data.


In [1]:
import numpy as np
from scipy import optimize

# training data
x_train = np.array(([3,5],[5,1],[10,2],[6,1.5]), dtype=float)
y_train = np.array(([75],[82],[93],[70]), dtype=float)

# testing data
x_test = np.array(([4, 5.5],[4.5, 1],[9,2.5],[6,2]), dtype=float)
y_test = np.array(([70],[89],[85],[75]), dtype=float)

The importance in separating testing data from training data is to always make sure the model is according to the real world by comparing to testing values. The test data doesn’t go through the same pipelines as the training data does. The input data values are passed through the input neurons as matrices for computational speed-ups.


$ X = \begin{bmatrix} X_{1,1} & X_{1,2}\\ X_{2,1} & X_{2,2} \\ X_{3,1} & X_{3,2} \end{bmatrix} $

Before processing anymore data there is one more thing to account for. The input is in different units to the output. The network won’t be smart enough to map a generalisation between different units of data, one being hours and the other a score out of 100. We can take advantage of the fact that all the data is positive and divided the individual values by their respective maximum to get a number between 0 and 1.


In [2]:
def scale_data(hours, test_score):
    MAX_SCORE = 100
    hours = hours / np.amax(hours, axis=0)
    test_score /= MAX_SCORE
    return hours, test_score

# normalize data
x_train, y_train = scale_data(x_train, y_train)
x_test, y_test   = scale_data(x_test, y_test)

In [3]:
x_train


Out[3]:
array([[ 0.3,  1. ],
       [ 0.5,  0.2],
       [ 1. ,  0.4],
       [ 0.6,  0.3]])

In [4]:
y_train


Out[4]:
array([[ 0.75],
       [ 0.82],
       [ 0.93],
       [ 0.7 ]])

The next phase now for the input is for it to multiplied using the dot product to the first set of weights on the first layer of synapses. With a total of 6 weights where three are connected to each input neuron, a 2x3 matrix is formed:


$ W^{(1)} = \begin{bmatrix} W^{(1)}_{1,1} & W^{(1)}_{1,2} & W^{(1)}_{1,3} \\ W^{(1)}_{2,1} & W^{(1)}_{2,2} & W^{(1)}_{2,3} \end{bmatrix} $

The activity of the second layer is produced though the matrix multiplication of the input and the first set of weights. This activity results in a 3x3 matrix z2
$ z^{(2)} = W^{(1)} X $

$ z^{(2)} = \begin{bmatrix} X_{1,1} W^{(1)}_{1,1} + X_{1,2} W^{(1)}_{2,1} & X_{1,1} W^{(1)}_{1,2} + X_{1,2} W^{(1)}_{2,2} & X_{1,1} W^{(1)}_{1,3} + X_{1,2} W^{(1)}_{1,3} \\ X_{2,1} W^{(1)}_{1,1} + X_{2,2} W^{(1)}_{2,1} & X_{2,1} W^{(1)}_{1,2} + X_{2,2} W^{(1)}_{2,2} & X_{2,1} W^{(1)}_{1,3} + X_{2,2} W^{(1)}_{2,3} \\ X_{3,1} W^{(1)}_{1,1} + X_{2,3} W^{(1)}_{2,1} & X_{3,1} W^{(1)}_{1,2} + X_{3,2} W^{(1)}_{2,2} & X_{3,1} W^{(1)}_{1,3} + X_{2,3} W^{(1)}_{2,3} \end{bmatrix} $

The activity of the second layer z2 now needs to be activated through an activation function. The non-linear activation function being used will be the sigmoid function.


$ \varphi(x) = \frac{1}{1 + e^{-x}} $

This function needs to be applied to every entry in the activity matrix z2 to produce the activation of the second layer a2.


$ \varphi(z^{(2)}) = \begin{bmatrix} \varphi(X_{1,1} W^{(1)}_{1,1} + X_{1,2} W^{(1)}_{2,1}) & \varphi(X_{1,1} W^{(1)}_{1,2} + X_{1,2} W^{(1)}_{2,2}) & \varphi(X_{1,1} W^{(1)}_{1,3} + X_{1,2} W^{(1)}_{1,3}) \\ \varphi(X_{2,1} W^{(1)}_{1,1} + X_{2,2} W^{(1)}_{2,1}) & \varphi(X_{2,1} W^{(1)}_{1,2} + X_{2,2} W^{(1)}_{2,2}) & \varphi(X_{2,1} W^{(1)}_{1,3} + X_{2,2} W^{(1)}_{2,3}) \\ \varphi(X_{3,1} W^{(1)}_{1,1} + X_{2,3} W^{(1)}_{2,1}) & \varphi(X_{3,1} W^{(1)}_{1,2} + X_{3,2} W^{(1)}_{2,2}) & \varphi(X_{3,1} W^{(1)}_{1,3} + X_{2,3} W^{(1)}_{2,3}) \end{bmatrix} $

$ a^{(2)} = \varphi(z^{(2)}) $

Once the activation has been computed it needs to multiplied by the second set of weights on the next synapses. This time there are 3 weights from 3 hidden neurons going into one output neuron, formulating a 3x1 matrix of new weights:


$ W^{(2)} = \begin{bmatrix} W^{(2)}_{1,1} \\ W^{(2)}_{2,1} \\ W^{(2)}_{3,1} \end{bmatrix} $

The matrix multiplication of the second layer activation and the second set of weights results in a 3x1 matrix yielding the activity for the third and final layer z3.


$ z^{(3)} = a^{(2)}W^{(2)} $

$ z^{(3)} = \begin{bmatrix} W^{(2)}_{1,1} a^{(2)}_{1,1} + W^{(2)}_{1,1} a^{(2)}_{1,2} + W^{(2)}_{1,1} a^{(2)}_{1,3} \\ W^{(2)}_{2,1} a^{(2)}_{2,1} + W^{(2)}_{2,1} a^{(2)}_{2,2} + W^{(2)}_{2,1} a^{(2)}_{2,3} \\ W^{(2)}_{3,1} a^{(2)}_{3,1} + W^{(2)}_{2,1} a^{(2)}_{3,2} + W^{(2)}_{3,1} a^{(2)}_{3,3} \end{bmatrix} $

The final computation is the activation function of the third layer activity z3. This result will also yield our final output prediciton from the output layer.


$ \varphi(z^{(3)}) = \begin{bmatrix} \varphi(W^{(2)}_{1,1} a^{(2)}_{1,1} + W^{(2)}_{1,1} a^{(2)}_{1,2} + W^{(2)}_{1,1} a^{(2)}_{1,3}) \\ \varphi(W^{(2)}_{2,1} a^{(2)}_{2,1} + W^{(2)}_{2,1} a^{(2)}_{2,2} + W^{(2)}_{2,1} a^{(2)}_{2,3}) \\ \varphi(W^{(2)}_{3,1} a^{(2)}_{3,1} + W^{(2)}_{2,1} a^{(2)}_{3,2} + W^{(2)}_{3,1} a^{(2)}_{3,3}) \end{bmatrix} $

$ \hat{y} = \varphi(z^{(3)}) $

The forward propagation algorithm has been shown Mathematically, now its time to put all this into code. Before a class is built, weights need to be randomly initialised.


In [5]:
class Neural_Network(object):
    
    def __init__(self):
        # define hyperparameters
        self.input_layer_size = 2
        self.hidden_layer_size = 3
        self.output_layer_size = 1
        
        #define parameters
        self.W1 = np.random.randn(self.input_layer_size, self.hidden_layer_size)
        self.W2 = np.random.randn(self.hidden_layer_size, self.output_layer_size)
        
    # forward propagation
    def forward(self, X):
        self.z2 = np.dot(X, self.W1)
        self.a2 = self.sigmoid(self.z2)
        self.z3 = np.dot(self.a2, self.W2)
        prediction = self.sigmoid(self.z3)
        return prediction
    
    # activation functions
    def sigmoid(self, z):
        return 1 / (1 + np.exp(-z))

In [6]:
NN = Neural_Network()

In [7]:
NN.forward(x_train)


Out[7]:
array([[ 0.36756757],
       [ 0.35411977],
       [ 0.3499925 ],
       [ 0.35427414]])

In [8]:
y_train


Out[8]:
array([[ 0.75],
       [ 0.82],
       [ 0.93],
       [ 0.7 ]])

Now that the forward propagation is done its quite clear that the predicted outputs are far off the supervised values, this is because the network has not yet been trained.