Applications of Machine Learning with Artificial Neural Networks and Supervised Regression

Lucas Barbosa

Introduction

Artificial Neural Network's (ANN's) have been around for a very long time in the field of Machine Learning. ML is dedicated to training very simple to highly sophisticated algorithms how to recognize patterns in data. Due to ML being such a broad area of expertise, there are many ML models which are all designed as learners to generalise from experience.

Image Classification
Natural Language Processing
Computer

The different ML models range form Support Vector Machines, Artificial Neural Networks, Deep Belief Networks etc. The trick to any ML practice is that each problem requires a tailored approach in order to be solved using any one of the above models listed. The idea of training an algorithm revolves around almost all ML practices. There are 3 different classifications of learning strategies:

Supervised Learning	Unsupervised Learning	Reinforcement Learning
The algorithm is given both a set of inputs and outputs and the role of the algorithm is to map a general rule between the inputs and outputs.	No labels are given to the algorithm’s output and it’s left on its own to find the structure of the input. This practice revolves a lot around feature learning.	The algorithm is constantly interacting with the surrounding dynamic environment performing a critical task, such as driving a car. The algorithm is provided feedbackwith rewards and punishments as it navigates through space.

Defining the Problem

The task at hand is predicting student’s score on a test based on the amount of hours slept and studied the night before. The main goal is to build an operational algorithm which can predict you’re score on a test based on the above statistics.

Hour Slept & Studied (hrs)	Test Score (max out of 100)
3, 5	75
5, 1	82
10, 2	93

The ML model being used to solved this particular task will be a basic Feed-Forward Artificial Neural Network. This problem set will also be considered a supervised regression problem. Supervised due to the inputs and outputs provided for training and regression because of the continuous nature of the data.

Theory

ANN’s date back to the early beginnings of AI and ML in 1960 thanks to the very generous predictions from Mathematician Alan Turing. ANN’s are inspired off our most interesting biological entity, the human brain. Our brain is comprised of over 100 billion tiny units called neurons. Each neuron is connected to thousands of other neurons and communicate via electrochemical signals. Signals come into neurons via junctions called synapses, these are located at the end of the branches called dendrites.

Neurons continuously receive signals from the synapses and do their magic which involves summing up all of the inputs being received from the other neurons and if the result is greater then a threshold value, the neuron shoots. By shooting this means the neurons generates a voltage and outputs a signal along something called an axon.

Above is a great example of a neuron however ANN’s aren’t comprised of biological neurons, they’re made up of artificial neurons which look more like:

Each input into a neuron is associated with its own synaptic weight which is a parameter. Parameters are fine-tuned during the training process. Weights can be both positive and negative values, therefore providing excitatory or inhibitory influences to each input.

As each input enters through the synapses they are multiplied by the weight then summed together by the neuron. This process the activity for that layer of neurons.

The inputs x being:

$ x_{1} + x_{2} + x_{3} + \cdots + x_{n} $

The synaptic weights w being:

$ w_{1} + w_{2} + w_{3} + \cdots + w_{n} $

The neuron being responsible for the following operation:

$ a = x_{1}w_{1} + x_{2}w_{2} + x_{3}w_{3} + \cdots + x_{n}w_{n} $
$ a = \sum_{i=0}^{n} x_{i}w_{i} $

An activation function is then applied to the activity of the neuron and the yielded result is then passed onto the next synapse. This process occurs for however many layers there are in the network (note that an ANN with several hidden layers is also know as a Deep Belief Network). Once a final output result is given it is passed into a cost function which will determine how much error there was in the prediction.

The cost function will determine the model’s accuracy and determine whether its shiny gold or no good at all. The cost function being used will be:

$ \eta = \sum \frac{1}{2} (y - \hat{y})^{2} $

Since the total error is computed over the entire data set this is called Batch Gradient Descent, since the gradient of the above function will be used to train the ANN later.

Next up...Implementaton