Feed-forward neural networks are inspired by the information processing of one or more neural cells, called a neuron.

A neuron accepts input signals via its dendrites, which pass the electrical signal down to the cell body.

The axon carries the signal out to synapses, which are the connections of a cell’s axon to other cell’s dendrites.

The principle of the backpropagation approach is to model a given function by modifying internal weightings of input signals to produce an expected output signal.

The system is trained using a supervised learning method, where the error between the system’s output and a known expected output is presented to the system and used to modify its internal state.

As such, it requires a network structure to be defined of one or more layers where one layer is fully connected to the next layer.

A standard network structure is one input layer, one hidden layer, and one output layer.

Backpropagation can be used for both classification and regression problems, but we will focus on classification in this tutorial.

In classification problems, best results are achieved when the network has one neuron in the output layer for each class value.

For example, a 2-class or binary classification problem with the class values of A and B.

These expected outputs would have to be transformed into binary vectors with one column for each class value.

Such as [1, 0] and [0, 1] for A and B respectively.

This is called a one hot encoding.

There are 201 records and 7 numerical input variables.

It is a classification problem with 3 output classes.

The scale for each numeric input value vary, so some data normalization may be required for use with algorithms that weight inputs like the backpropagation algorithm.

Using the Zero Rule algorithm that predicts the most common class value, the baseline accuracy for the problem is 28.095%.

You can learn more and download the seeds dataset from the **UCI Machine Learning Repository**.

Download the seeds dataset and place it into your current working directory with the filename seeds_dataset.csv.

The dataset is in tab-separated format, so you must convert it to CSV using a text editor or a spreadsheet program.

```
In [26]:
```import pandas as pd
seeds_dataset = pd.read_csv('seeds_dataset.csv', header=None)

Below is a sample of the first 5 rows of the seeds dataset.

```
In [27]:
```seeds_dataset[:5]

```
Out[27]:
```

This tutorial is broken down into 6 parts:

- Initialize Network.
- Forward Propagate.
- Back Propagate Error.
- Train Network.
- Predict.
- Seeds Dataset Case Study.

These steps will provide the foundation that you need to implement the backpropagation algorithm from scratch and apply it to your own predictive modeling problems.

Each neuron has a set of weights that need to be maintained.

One weight for each input connection and an additional weight for the bias.

We will need to store additional properties for a neuron during training, therefore we will use a dictionary to represent each neuron and store properties by names such as ** weights** for the weights.

A network is organized into layers.

The input layer is really just a row from our training dataset.

The first real layer is the hidden layer.

This is followed by the output layer that has one neuron for each class value.

It is good practice to initialize the network weights to small random numbers.

In this case, will we use random numbers in the range of 0 to 1.

Below is a function named ** initialize_network()** that creates a new neural network ready for training.

It accepts three parameters: the number of inputs, the number of neurons to have in the hidden layer, and the number of outputs.

You can see that for the hidden layer we create

`n_hidden`

`n_inputs + 1`

You can also see that the output layer that connects to the hidden layer has

`n_outputs`

`n_hidden + 1`

This means that each neuron in the output layer connects to (has a weight for) each neuron in the hidden layer.

```
In [28]:
```# Initialize a network:
def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network

Let's test out this function.

Below is a complete example that creates a small network:

```
In [29]:
```from random import seed
from random import random
def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network
seed(1)
network = initialize_network(2, 1, 2)
for layer in network:
print(layer)

```
```

You can see that the hidden layer has one neuron with 2 input weights plus the bias.

The output layer has two neurons, each with one weight plus the bias.

Now that we know how to create and initialize a network, let's see how we can use it to calculate an output.

We can calculate an output from a neural network by propagating an input signal through each layer until the output layer outputs its values.

We call this forward-propagation.

It is the technique we will need to generate predictions during training that will need to be corrected, and it is the method we will need after the network is trained to make predictions on new data.

We can break forward propagation down into three parts:

- Neuron Activation.
- Neuron Transfer.
- Forward Propagation.

The input could be a row from our training dataset, as in the case of the hidden layer.

It may also be the outputs from each neuron in the hidden layer, in the case of the output layer.
Neuron activation is calculated as the weighted sum of the inputs.

Much like linear regression.

activation = sum(weight_i * input_i) + bias

**weight** is a network weight, **input** is an input, i is the index of a weight or an input and **bias** is a special weight that has no input to multiply with (or you can think of the input as always being 1.0).
Below is an implementation of this in a function named activate().

You can see that the function assumes that the bias is the last weight in the list of weights.

This helps here and later to make the code easier to read.

```
In [30]:
```# Calculate neuron activation for an input
def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation

Now, let's see how to use the neuron activation.

Different transfer functions can be used.

It is traditional to use the **sigmoid activation function**, but you can also use the **tanh (hyperbolic tangent)** function to transfer outputs.

More recently, the **rectifier transfer function** has been popular with large deep learning networks.

The sigmoid activation function looks like an S shape, it’s also called the logistic function.

It can take any input value and produce a number between 0 and 1 on an S-curve.

It is also a function of which we can easily calculate the derivative (slope) that we will need later when backpropagating error.

We can transfer an **activation function** using the sigmoid function as follows:

output = 1 / (1 + e^(-activation))

or:

$\sigma(x) = 1 / (1 + e^{-x})$

Where **e** (**Euler's Number**) is the base of the **natural logarithm**.

Below is a function named ** transfer()** that implements the sigmoid equation.

```
In [31]:
```# Transfer neuron activation:
def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

Now that we have the pieces, let's see how they are put together and used.

We work through each layer of our network calculating the outputs for each neuron.

All of the outputs from one layer become inputs to the neurons on the next layer.

Below is a function named ** forward_propagate()** that implements the forward propagation for a row of data from our dataset with our neural network.

You can see that a neuron’s output value is stored in the neuron with the name

`output`

You can also see that we collect the outputs for a layer in an array named

`new_inputs`

`inputs`

The function returns the outputs from the last layer also called the output layer.

```
In [32]:
```# Forward propagate input to a network output:
def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs

We define our network inline with one hidden neuron that expects 2 input values and an output layer with two neurons.

```
In [33]:
```from math import exp
# Calculate neuron activation for an input:
def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation
# Transfer neuron activation:
def transfer(activation):
return 1.0 / (1.0 + exp(-activation))
# Forward propagate input to a network output:
def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs

Time for some testing:

```
In [34]:
```# Test forward propagation:
network = [[{'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}],
[{'weights': [0.2550690257394217, 0.49543508709194095]},
{'weights': [0.4494910647887381, 0.651592972722763]}]]
row = [1, 0, None]
output = forward_propagate(network, row)
output

```
Out[34]:
```

Because the output layer has two neurons, we get a list of two numbers as output.

The actual output values are just nonsense for now, but next, we will start to learn how to make the weights in the neurons more useful.

The backpropagation algorithm is named for the way in which the weights are trained (backwards propagation of errors).

Error is calculated between the expected outputs and the outputs forward propagated from the network.

These errors are then propagated backward through the network from the output layer to the hidden layer, assigning blame for the error and updating weights as they go.

The **math for backpropagating error** is **rooted in calculus**, but we will remain high level in this section and focus on what is calculated and how rather than why the calculations take this particular form.

This part is broken down into two sections.

- Transfer Derivative.
- Error Backpropagation.

We are using the sigmoid transfer function, the derivative of which can be calculated as follows:

derivative = output * (1.0 - output)

Below is a function named ** transfer_derivative()** that implements this equation:

```
In [35]:
```# Caculate the derivative of a neuron output:
def transfer_derivative(output):
return output * (1.0 - output)

Now let's see how this can be used.

This will give us our error signal (aka input) to propagate backwards through the network.

The error for a given neuron can be calculated as follows:

error = (expected - output) * transfer_derivative(output)

** expected** is the expected output value for the neuron,

`output`

`transfer_derivative()`

This error calculation is used for neurons in the output layer.

The expected value is the class value itself.

In the hidden layer, things are a little more complicated.

Think of the error traveling back along the weights of the output layer to the neurons in the hidden layer.

The back-propagated error signal is accumulated and then used to determine the error for the neuron in the hidden layer, as follows:

error = (weight_k * error_j) * transfer_derivative(output)

** error_j** is the error signal from the

`j`

`weight_k`

`k`

`output`

** backward_propagate_error()** that implements this procedure.

You can see that the error signal calculated for each neuron is stored with the name

`delta`

You can see that the layers of the network are iterated in reverse order, starting at the output and working backwards.

This ensures that the neurons in the output layer have 'delta' values calculated first so that neurons in the hidden layer can use the result in the subsequent iteration.

I chose the name 'delta' to reflect the change the error implies on the neuron (e.g. the weight delta).

You can see that the error signal for neurons in the hidden layer is accumulated from neurons in the output layer where the hidden neuron number

`j`

`neuron[‘weights’][j]`

```
In [36]:
```# Backpropagate error and store in neurons:
def backward_propagate_error(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network) - 1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta'])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)):
neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

We define a fixed neural network with output values and backpropagate an expected output pattern.

The complete example is listed below.

```
In [37]:
```# Calculate the derivative of a neuron output:
def transfer_derivative(output):
return output * (1.0 - output)
# Backpropagate error and store in neurons:
def backward_propagate_error(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network) - 1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta'])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)):
neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

Let's run some tests to make sure we get what we want:

```
In [38]:
```# Test backpropagation of error:
network = [[{'output': 0.7105668883115941, 'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}],
[{'output': 0.6213859615555266, 'weights': [0.2550690257394217, 0.49543508709194095]},
{'output': 0.6573693455986976, 'weights': [0.4494910647887381, 0.651592972722763]}]]
expected = [0, 1]
backward_propagate_error(network, expected)
for layer in network:
print(layer)

```
```

You can see that error values are calculated and stored in the neurons for the output layer and the hidden layer.

Now let's use the backpropagation of errors to train the network.

The network is trained using stochastic gradient descent.

This involves multiple iterations of exposing a training dataset to the network and for each row of data forward propagating the inputs, backpropagating the error and updating the network weights.

This part is broken down into two sections:

- Update Weights.
- Train Network.

Network weights are updated as follows:

weight = weight + learning rate * error * input

** weight** is a given weight,

`learning_rate`

`error`

`input`

The same procedure can be used for updating the bias weight, except there is no input term, or input is the fixed value of 1.0.

Learning rate controls how much to change the weight to correct for the error.

For example, a value of 0.1 will update the weight 10% of the amount that it possibly could be updated.

Small learning rates that cause slower learning over a large number of training iterations are preferred.

This increases the likelihood of the network finding a good set of weights across all layers rather than the fastest set of weights that minimize error (called premature convergence).

** update_weights()** that updates the weights for a network given an input row of data, a learning rate and assume that a forward and backward propagation have already been performed.

Remember that the input for the output layer is a collection of outputs from the hidden layer.

```
In [39]:
```# Update network weights with error:
def update_weights(network, row, l_rate):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron['output'] for neuron in network[i - 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
neuron['weights'][-1] += l_rate * neuron['delta']

Now that we know how to update network weights, let's tell the machine how to do it repeatedly.

This involves first looping for a fixed number of epochs and within each epoch updating the network for each row in the training dataset.

Because updates are made for each training pattern, this type of learning is called online learning.

If errors were accumulated across an epoch before updating the weights, this is called batch learning or batch gradient descent.

The expected number of output values is used to transform class values in the training data into a one hot encoding.

That is a binary vector with one column for each class value to match the output of the network.

This is required to calculate the error for the output layer.

You can also see that the sum squared error between the expected output and the network output is accumulated each epoch and printed.

This is helpful to create a trace of how much the network is learning and improving each epoch.

```
In [40]:
```# Train an ANN for a fixed number of epochs:
def train_network(network, train, l_rate, n_epoch, n_outputs):
for epoch in range(n_epoch):
sum_error = 0
for row in train:
outputs = forward_propagate(network, row)
expected = [0 for i in range(n_outputs)]
expected[row[-1]] = 1
sum_error += sum([(expected[i] - outputs[i]) ** 2 \
for i in range(len(expected))])
backward_propagate_error(network, expected)
update_weights(network, row, l_rate)
print(">epoch: {}, l_rate: {:.3f}, sum_error: {:.3f}".format(epoch, l_rate, sum_error))

We can put together an example that includes everything we’ve seen so far including network initialization and train a network on a small dataset.

Below is a small contrived dataset that we can use to test out training our neural network:

X1 X2 Y
2.7810836 2.550537003 0
1.465489372 2.362125076 0
3.396561688 4.400293529 0
1.38807019 1.850220317 0
3.06407232 3.005305973 0
7.627531214 2.759262235 1
5.332441248 2.088626775 1
6.922596716 1.77106367 1
8.675418651 -0.242068655 1
7.673756466 3.508563011 1

We will use 2 neurons in the hidden layer.

It is a binary classification problem (2 classes) so there will be two neurons in the output layer.

The network will be trained for 20 epochs with a learning rate of 0.5, which is high because we are training for so few iterations.

```
In [41]:
```from math import exp
from random import seed
from random import random
# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network
# Calculate neuron activation for an input
def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation
# Transfer neuron activation
def transfer(activation):
return 1.0 / (1.0 + exp(-activation))
# Forward propagate input to a network output
def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs
# Calculate the derivative of an neuron output
def transfer_derivative(output):
return output * (1.0 - output)
# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta'])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)):
neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])
# Update network weights with error
def update_weights(network, row, l_rate):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron['output'] for neuron in network[i - 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
neuron['weights'][-1] += l_rate * neuron['delta']
# Train a network for a fixed number of epochs
def train_network(network, train, l_rate, n_epoch, n_outputs):
for epoch in range(n_epoch):
sum_error = 0
for row in train:
outputs = forward_propagate(network, row)
expected = [0 for i in range(n_outputs)]
expected[row[-1]] = 1
sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])
backward_propagate_error(network, expected)
update_weights(network, row, l_rate)
print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))
# Test training backprop algorithm
seed(1)
dataset = [[2.7810836,2.550537003,0],
[1.465489372,2.362125076,0],
[3.396561688,4.400293529,0],
[1.38807019,1.850220317,0],
[3.06407232,3.005305973,0],
[7.627531214,2.759262235,1],
[5.332441248,2.088626775,1],
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
[7.673756466,3.508563011,1]]
n_inputs = len(dataset[0]) - 1
n_outputs = len(set([row[-1] for row in dataset]))
network = initialize_network(n_inputs, 2, n_outputs)
train_network(network, dataset, 0.5, 20, n_outputs)
for layer in network:
print(layer)

```
```

We can see a trend of this error decreasing with each epoch.

Once trained, the network is printed, showing the learned weights.

Also still in the network are output and delta values that can be ignored.

We could update our training function to delete these data if we wanted.
Once a network is trained, we need to use it to make predictions.

We have already seen how to forward-propagate an input pattern to get an output.

This is all we need to do to make a prediction.

We can use the output values themselves directly as the probability of a pattern belonging to each output class.

It may be more useful to turn this output back into a crisp class prediction.

We can do this by selecting the class value with the larger probability.
This is also called the arg max function.

Below is a function named ** predict()** that implements this procedure.

It returns the index in the network output that has the largest probability.

It assumes that class values have been converted to integers starting at 0.

```
In [42]:
```# Make a prediction with the network:
def predict(network, row):
outputs = forward_propagate(network, row)
return outputs.index(max(outputs))

The example hardcodes a network trained from the previous step above.

The complete example is listed below.

```
In [43]:
```from math import exp
# Calculate neuron activation for an input:
def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation
# Transfer neuron activation:
def transfer(activation):
return 1.0 / (1.0 + exp(-activation))
# Forward propagate inputs to a network output:
def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs
# Make a prediction with a network:
def predict(network, row):
outputs = forward_propagate(network, row)
return outputs.index(max(outputs))
# Test making predictions with the network:
dataset = [[2.7810836,2.550537003,0],
[1.465489372,2.362125076,0],
[3.396561688,4.400293529,0],
[1.38807019,1.850220317,0],
[3.06407232,3.005305973,0],
[7.627531214,2.759262235,1],
[5.332441248,2.088626775,1],
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
[7.673756466,3.508563011,1]]
network = [[{'weights': [-1.482313569067226, 1.8308790073202204, 1.078381922048799]},
{'weights': [0.23244990332399884, 0.3621998343835864, 0.40289821191094327]}],
[{'weights': [2.5001872433501404, 0.7887233511355132, -1.1026649757805829]},
{'weights': [-2.429350576245497, 0.8357651039198697, 1.0699217181280656]}]]
for row in dataset:
prediction = predict(network, row)
print('Expected=%d, Actual=%d' % (row[-1], prediction))

```
```

It shows that the network achieves 100% accuracy on this small dataset.

Now we are ready to apply our backpropagation algorithm to a real world dataset.

The first step is to load the dataset and convert the loaded data to numbers that we can use in our neural network.

For this we will use the helper function ** load_csv()** to load the file,

`str_column_to_float(`

`str_column_to_int()`

Input values vary in scale and need to be normalized to the range of 0 and 1.

It is generally good practice to normalize input values to the range of the chosen transfer function, in this case, the sigmoid function that outputs values between 0 and 1.

The

`dataset_minmax(`

`normalize_dataset()`

This means that 201/5=40.2 or 40 records will be in each fold.

We will use the helper functions ** evaluate_algorithm()** to evaluate the algorithm with cross-validation and

`accuracy_metric()`

A new function named

`back_propagation()`

The complete example is listed below.

```
In [44]:
```# Backpropagation on the Seeds Dataset:
from random import seed
from random import randrange
from random import random
from csv import reader
from math import exp
# Load the CSV file with the seeds dataset:
def load_csv(filename):
dataset = list()
with open(filename, 'r') as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset
# Convert string column to float:
def str_column_to_float(dataset, column):
for row in dataset:
row[column] = float(row[column].strip())
# Convert string column to integer:
def str_column_to_int(dataset, column):
class_values = [row[column] for row in dataset]
unique = set(class_values)
lookup = dict()
for i, value in enumerate(unique):
lookup[value] = i
for row in dataset:
row[column] = lookup[row[column]]
return lookup
# Find the min and max values for each column:
def dataset_minmax(dataset):
minmax = list()
stats = [[min(column), max(column)] for column in zip(*dataset)]
return stats
# Rescale dataset columns to the range 0-1:
def normalize_dataset(dataset, minmax):
for row in dataset:
for i in range(len(row)-1):
row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])
# Split a dataset into k folds:
def cross_validation_split(dataset, n_folds):
dataset_split = list()
dataset_copy = list(dataset)
fold_size = int(len(dataset) / n_folds)
for i in range(n_folds):
fold = list()
while len(fold) < fold_size:
index = randrange(len(dataset_copy))
fold.append(dataset_copy.pop(index))
dataset_split.append(fold)
return dataset_split
# Calculate accuracy percentage:
def accuracy_metric(actual, predicted):
correct = 0
for i in range(len(actual)):
if actual[i] == predicted[i]:
correct += 1
return correct / float(len(actual)) * 100.0
# Evaluate the algorithm using a cross validation split:
def evaluate_algorithm(dataset, algorithm, n_folds, *args):
folds = cross_validation_split(dataset, n_folds)
scores = list()
for fold in folds:
train_set = list(folds)
train_set.remove(fold)
train_set = sum(train_set, [])
test_set = list()
for row in fold:
row_copy = list(row)
test_set.append(row_copy)
row_copy[-1] = None
predicted = algorithm(train_set, test_set, *args)
actual = [row[-1] for row in fold]
accuracy = accuracy_metric(actual, predicted)
scores.append(accuracy)
return scores
# Calculate neuron activation for an input:
def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation
# Transfer neuron activation:
def transfer(activation):
return 1.0 / (1.0 + exp(-activation))
# Forward propagate input to a network input:
def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs
# Calculate the derivative of a neuron's output:
def transfer_derivative(output):
return output * (1.0 - output)
# Backpropagate errors and store the results in the corresponding neurons:
def backward_propagate_error(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta'])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)):
neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])
# Update network weights with the errors:
def update_weights(network, row, l_rate):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron['output'] for neuron in network[i - 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
neuron['weights'][-1] += l_rate * neuron['delta']
# Train a network for a fixed number of epochs:
def train_network(network, train, l_rate, n_epoch, n_outputs):
for epoch in range(n_epoch):
for row in train:
outputs = forward_propagate(network, row)
expected = [0 for i in range(n_outputs)]
expected[row[-1]] = 1
backward_propagate_error(network, expected)
update_weights(network, row, l_rate)
# Initialize a network:
def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network
# Make a prediction with a network:
def predict(network, row):
outputs = forward_propagate(network, row)
return outputs.index(max(outputs))
# Backpropagation Algorithm with Stochastic Gradient Descent:
def back_propagation(train, test, l_rate, n_epoch, n_hidden):
n_inputs = len(train[0])-1
n_outputs = len(set([row[-1] for row in train]))
network = initialize_network(n_inputs, n_hidden, n_outputs)
train_network(network, train, l_rate, n_epoch, n_outputs)
predictions = list()
for row in test:
prediction = predict(network, row)
predictions.append(prediction)
return predictions
# Test backpropagation on seeds dataset:
seed(1)
# Load and prepare data:
filename = 'seeds_dataset.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])-1):
str_column_to_float(dataset, i)
# Convert class column to integers:
str_column_to_int(dataset, len(dataset[0])-1)
# Normalize the input variables:
minmax = dataset_minmax(dataset)
normalize_dataset(dataset, minmax)
# Evaluate the algorithm:
n_folds = 5
l_rate = 0.3
n_epoch = 500
n_hidden = 5
scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
print("Scores: \n{}".format(scores))
print("Mean Accuracy: {:.3f}".format(sum(scores)/float(len(scores))))

```
```

The network was trained for 500 epochs with a learning rate of 0.3.

These parameters were found with a little trial and error, but you may be able to do much better.

Running the example prints the average classification accuracy on each fold as well as the average performance across all folds.

You can see that backpropagation and the chosen configuration achieved a mean classification accuracy of 93.333%.

```
In [ ]:
```