Lunch Bytes (LB22) • October 26, 2018 • Matt Grossi (matthew.grossi at rsmas.miami.edu)
This document provides a practical follow-up to my talk, Peeking under the hood of an artificial neural network.
First we define functions for the routines we'll be using frequently. The workflow for [most of] these come from the presentation slides.
Disclaimer: There are undoubtedly far more efficient ways to carry out these operations. I've chosen overly descriptive variables names to be as descriptive (and hopefully as helpful) as possible. I have not thoroughly checked this over for correct performance, typos, etc.
In [18]:
import numpy as np
import pandas as pd
from sklearn import model_selection
In [19]:
def xfer(wsum):
out = 1.0 / (1.0 + np.exp(-wsum))
return out
In [20]:
def ErrHid(output, weights, outerr):
dt = np.dot(weights, outerr)
ErrorHid = output * (1.0 - output) * dt
return ErrorHid
In [21]:
def ErrOut(output, targets):
ErrorOut = output * (1.0 - output) * (targets - output)
return ErrorOut
In [22]:
def WgtAdj(weights, responsibities, values, learnrate):
weights += (learnrate * np.outer(values, responsibilities))
return weights
In [23]:
def BiasAdj(bias, responsibilities, learnrate):
bias += (learn * responsibilities)
return bias
In [24]:
def run(examples, targets, weightsI2H, weightsH2O, biasHID, biasOUT):
HiddenLayerInput = np.dot(examples, weightsI2H) + biasHID
HiddenLayerOutput = xfer(wsum=HiddenLayerInput)
OutputLayerInput = np.dot(HiddenLayerOutput, weightsH2O) + biasOUT
Output = xfer(wsum=OutputLayerInput)
return Output
In [25]:
def nnet(examples, targets, hidden, learn):
# Set number of attributes passed: number of columns in 'examples'
NumAtt = examples.shape[1]
# Set number of output neurons: number of columns in 'targets'
NumOut = train_targets.shape[1]
# Randomly initialize weight matrices (replace any zeros with 0.1)
weightsI2H = np.random.uniform(-1,1, size=(NumAtt,hidden))
weightsI2H[weightsI2H==0] = 0.1
weightsH2O = np.random.uniform(-1,1,size=(hidden,NumOut))
weightsH2O[weightsH2O==0] = 0.1
# Randomly initialize biases for hidden and output layer neurons
biasHID = np.random.uniform(-1,1,size=hidden)
biasOUT = np.random.uniform(-1,1,size=NumOut)
# Loop to pass each training example
for ex in range(len(examples)):
# Forward propagate examples
HiddenLayerInput = np.dot(examples[ex,:], weightsI2H) + biasHID
HiddenLayerOutput = xfer(wsum=HiddenLayerInput)
OutputLayerInput = np.dot(HiddenLayerOutput, weightsH2O) + biasOUT
Output = xfer(wsum=OutputLayerInput)
# Back-proagate error: calculaterror responsibilities for output and hidden layer neurons
OutputErr = ErrOut(output=Output, targets=targets[ex,:])
HiddenErr = ErrHid(output=HiddenLayerOutput, weights=weightsH2O, outerr=OutputErr)
# Adjust weights
weightsI2H += learn * np.outer(examples[ex,:], HiddenErr)
weightsH2O += learn * np.outer(HiddenLayerOutput, OutputErr)
biasHID += learn * HiddenErr
biasOUT += learn * OutputErr
# Correctly classified?
if (train_targets[ex,0] < train_targets[ex,1]) == (Output[0] < Output[1]):
print('Correct')
else:
print('Incorrect')
return weightsI2H, weightsH2O, biasHID, biasOUT
Let's test this on a data set from the UC Irving Machine Learning Respository (https://archive.ics.uci.edu/ml/index.php). This handy respository contains hundreds of pre-classified data sets that are ideal for testing machine learning algorithms.
Consider the banknote authentication data set. The documentation states that it contains 1372 examples of banknote-like specimens that are classified as either authentic (class 1) or not authentic (class 0) based on four attributes: variance, skewness, curtosis, and entropy of the image.
In [26]:
data = pd.read_csv("~/Documents/Classes/RSMAS/MachLearn/Project1/banknote.csv",
names=['var', 'skew', 'curt', 'ent', 'class-yes'])
In [27]:
data.head(3)
Out[27]:
In [28]:
data.tail(3)
Out[28]:
Before feeding a neural net, we need to set up our data. First, we note that these data are sorted by class (the last column, 0s and 1s), as is often the case with pre-classified data. It is good practice to shuffle the data to help ensure that the distribution of classes is roughly the same in both the training and testing subsets.
In [29]:
data = data.sample(frac=1).reset_index(drop=True)
We can think of these data as having two classes: authentic and not authentic. Our data set already contains a column that contains 1s whenever the example is in the authentic class. Now let's add a column that has a 1 when the example is in the not authentic class. Then print the first few lines to make sure we have what we think we have.
In [30]:
data['class-no'] = np.where(data['class-yes']==0, 1, 0)
In [31]:
data.head()
Out[31]:
Next we need to normalize our data such that all columns are between 0 and 1:
In [32]:
data_norm = (data - data.min(axis=0)) / (data.max(axis=0) - data.min(axis=0))
Now, split the data into training and testing subsets. Here we choose to use 80% of the examples for training and 20% for testing. Finally, separate into examples, which will contain the attributes, and the targets, which will contain the class flag.
In [33]:
train, test = model_selection.train_test_split(data_norm, test_size = 0.2)
In [34]:
train_examples = np.array(train.iloc[:,0:4], dtype=np.float64)
train_targets = np.array(train[['class-yes', 'class-no']], dtype=np.float64)
test_examples = np.array(test.iloc[:,0:4], dtype=np.float64)
test_targets = np.array(test[['class-yes', 'class-no']], dtype=np.float64)
Finally, replace targets 1 and 0 with 0.8 and 0.2. (Why?)
In [35]:
#train_targets[np.where(train_targets==1)] = 0.8
#train_targets[np.where(train_targets==0)] = 0.2
#test_targets[np.where(test_targets==1)] = 0.8
#test_targets[np.where(test_targets==0)] = 0.2
Now train the neural net!
In [36]:
I2H, H2O, bHID, bOUT = nnet(examples=train_examples, targets=train_targets,
hidden=5, learn=0.05)
For instructive purposes, the function prints whether each example is correctly classified or not. One can see the variation in performance from example to example. Keep in mind that after each incorrect classification, slight adjustments are made to the internal weights.
The function 'nnet' loops through every training example once, representing one training epoch. In practice, the network would be trained over many (sometimes hundreds) of epochs, depending on the number of examples and the complexity of the data. At the end of each epoch, the model is run on the testing data and the mean squared error (MSE) is calculated over all of these training examples to assess the model performance. As the model learns, the MSE should decrease.
Just for fun, let's run our neural net on the test examples. Notice that 'nnet' outputs a weight matrix and bias vector for each layer. All we need to do to run the model is forward-propagate the testing examples using these final weights and biases. No weight adjustments will be made, because we are running a trained model. See the 'run' function above.
In [37]:
out = run(examples=test_examples, targets=test_targets,
weightsI2H=I2H, weightsH2O=H2O, biasHID=bHID, biasOUT=bOUT)
In [38]:
out[0:5,]
Out[38]:
What do these numbers mean? Remember that our target matrix consists of 0.8s and 0.2s (originally 1s and 0s) and correspond to whether the example was in the class authentic or not authentic. Our neural network had 2 output neurons, one for each class. The columns of 'out' therefore represent these two classes, in the same order as in the 'data'.
Because this is a simple classification problem, we can choose for each example the class (i.e., column) with the higher number:
In [39]:
# Predicted classes
predicted_classes = np.array(pd.DataFrame(out).idxmax(axis=1))
print(predicted_classes)
In [40]:
# Target classes
target_classes = np.array(pd.DataFrame(test_targets).idxmax(axis=1))
print(target_classes)
Remember that python uses zero-based indexing. Thus, a 0 here means the first column contains the highest number, meaning the example is authentic. This is admittedly a little confusing. How does this compare to the actual testing data? The neural net is correct if the predicted class agrees with the actual class:
In [41]:
sum(target_classes == predicted_classes)/len(test_targets) * 100
Out[41]:
Not well! In one epoch, the network is still pretty random. If the two classes are equally represented, then the network will achive 50% accuracy by predicting the same class every time, which is hardly impressive. Here it correctly classified 54% of the examples. But remember, this is only one epoch, and often many are needed to properly train a neural net.
This data set is particularly clean and linearly separable, so it should be predicted with relative ease. Indeed, after 200 or so epochs, this simple neural net is able to achieve >99% accuracy on testing data.
That's all there is to it!