Perceptron

We will now implement the perceptron classifier. We put ourselves in a setting where we have access to training examples $\boldsymbol{x}_i$ and each of them is associated with a target $y\in{-1,1}$.

The perceptron classifier is a simple model that consists of a single neuron with a step activation function (also known as a heavyside step function).

One can visualize it as follows:

If you have questions or comments : charlotte[dot]laclau[at]univ-grenoble-alpes[dot]fr

Forward propagation

Given an n-dimensional input $x\in\mathbb{R}^n$ and taking into account the bias unit, one moves from the input to the output with two steps:

  • $s= \sum\limits_{i=0}^n w_ix_i$, where for consistency assume that the bias unit is $x_0$
  • Application of the step function on s, that is $a = \begin{cases}
    1 & s\geq 0 \\
    -1 & s < 0 
    
    \end{cases} $

Error propagation

Repeat until max_epochs (maximum number of iterations) or convergence is reached:

For each example in the training set:

  1. Forward pass to calculate $a$
  2. If the example is misclassified: update each of the weights $i$ with: $w_i^{(t+1)} = w_i^{(t)} + \eta*x_i*y_i$

Why these update equations?

We are in the standard supervised classification setting, where one has $k$ examples $\vec{X}={\vec{x_1}, \ldots, \vec{x_k}}$, where $\vec{x_k} \in \mathbb{R}^n$. Each $\vec{x}_k \in \vec{X}$ is associated with a category $y_k \in \mathbb{Y}$, from a pre-defined set of categories. In the binary case $\mathbb{Y}=\{-1, +1\}$.

We then, want to learn a vector $\vec{w} \in \mathbb{R}^{n+1}$, to perform the above described classification step. For the weight vector $\vec{w}$ we moved from $n$ to $n+1$ dimensions to account for the bias unit. Using the perceptron algorithm we want to minimize the number of examples we misclassify, and essentially if the examples are linearly separable, misclassify nothing. Hence, one can define a simple loss function:

$\mathbb{L} = -\sum\limits_{k} y_k(\vec{w}\cdot\vec{x_k})$

In the online case, where one updates the weights for a single instance $k$, this becomes: $\mathbb{L} = - y_k(\vec{w}\cdot\vec{x_k})$, where L

To change the direction of $\vec{w}$ when we misclassify, we can:

$\nabla L = \frac{\partial L}{\partial w}= - y_k x_k$

We scale this update using the learning rate $\eta$ and we update by taking a step towards the negative of the gradient:

$w_k^{(t+1)} = w_k^{(t)} + \eta*x_k*y_k$

The next steps

TBD:

  1. create a simple training set like the OR function
  2. Learn the OR function

Pseudo-code : input: X, Y, eta, w, n for i in 1:n pick an example randomly result <- w*x if (result<0) result=0 else result =1 error <- Y - result w <- w + eta*error*x

(indication: the if statement could be defined as a function beforehand in a step function)

  1. We are now going to work on the sonar.txt dataset.

    1. Download the dataset using the read_table function of the pandas librairy. This will create a dataframe (same than in R). Before starting, you can check the size of the data or other basic statistics (head, shape, etc.). The last column contains the class of each object.
    2. Separate the features from the target (see loc); Transform the target into numeric values.
    3. Split the dataset into train and test files (see train_test_split)
    4. Use the perceptron classifier available in sckit-learn. The function presents a lot of options that you should explore.

    5. Repeat the same operations but using K-folds instead of one train and one test file.

  2. If you feel comfortable with Python, code the perceptron classifier

You may want to ckeck: np.array, numpy.random.rand, numpy.dot, random.choice


In [1]:
import random, numpy as np, matplotlib.pyplot as plt, time
%matplotlib inline

In [2]:
# Training data for the first question
training_data = [
    (np.array([0,0,1]), 0),
    (np.array([0,1,1]), 1),
    (np.array([1,0,1]), 1),
    (np.array([1,1,1]), 1),
]


def unit_step(value):
    if value < 0:
        return 0 
    else: 
        return 1
   
    #  or unit_step = lambda x: 0 if x <0 else 1
n = 20 
eta = 0.2 
errors = []
w = np.random.rand(3)

for i in range(n):
    x, expected = random.choice(training_data)
    result = np.dot(w,x)
    error = expected - unit_step(result)
    w += eta*error*x
    errors.append(error)
    
for x, _ in training_data:
    result = np.dot(x, w)
    print("{}: {} -> {}".format(x[:2], result, unit_step(result)))


[0 0]: -0.04574258185339358 -> 0
[0 1]: 0.47352494545658447 -> 1
[1 0]: 0.6756688557541826 -> 1
[1 1]: 1.1949363830641606 -> 1

In [3]:
# Part 1
import sklearn as sk
from sklearn.linear_model import Perceptron
import pandas as pd 

# Load dataset 
sonar = pd.read_table('sonar.txt', header = None, delimiter=',')
sonar.head()


# In case of missing values, you can remove NaN elements using dropna function
# sonar = sonar.dropna(how="any", axis=0)


Out[3]:
0 1 2 3 4 5 6 7 8 9 ... 51 52 53 54 55 56 57 58 59 60
0 0.0200 0.0371 0.0428 0.0207 0.0954 0.0986 0.1539 0.1601 0.3109 0.2111 ... 0.0027 0.0065 0.0159 0.0072 0.0167 0.0180 0.0084 0.0090 0.0032 R
1 0.0453 0.0523 0.0843 0.0689 0.1183 0.2583 0.2156 0.3481 0.3337 0.2872 ... 0.0084 0.0089 0.0048 0.0094 0.0191 0.0140 0.0049 0.0052 0.0044 R
2 0.0262 0.0582 0.1099 0.1083 0.0974 0.2280 0.2431 0.3771 0.5598 0.6194 ... 0.0232 0.0166 0.0095 0.0180 0.0244 0.0316 0.0164 0.0095 0.0078 R
3 0.0100 0.0171 0.0623 0.0205 0.0205 0.0368 0.1098 0.1276 0.0598 0.1264 ... 0.0121 0.0036 0.0150 0.0085 0.0073 0.0050 0.0044 0.0040 0.0117 R
4 0.0762 0.0666 0.0481 0.0394 0.0590 0.0649 0.1209 0.2467 0.3564 0.4459 ... 0.0031 0.0054 0.0105 0.0110 0.0015 0.0072 0.0048 0.0107 0.0094 R

5 rows × 61 columns

Part 2: pre-processing

Before splitting the data into the test and the train set, you should check if you need to normalise the data. Usually, it is necessary if the scales of the variables are two different from one another. For the sonar data, all variables have values between 0 and 1, so it's fine!


In [5]:
# Import the function for splitting the data from sklearn
from sklearn.model_selection import train_test_split

#Separate data from label. To access elements of a dataframe using the position, use .loc 
x = sonar.loc[:,range(60)]
target = sonar.loc[:,60]

# Use unique function to describe the possible labels 
print("Unique labels: {0}".format(np.unique(target)))


Unique labels: ['M' 'R']

Split the data: random_state allow you to control the randomness of your training and test set. Here I set it to 0 (it could be any integer of your choice) By doing so, everytime that I run de following lines, if random_state is at 0, I will create the same train and test.


In [6]:
random_state = 0
# test_size indicates the proportion of the instances which are used of the test set
x_train, x_test, y_train, y_test = train_test_split(x, target, test_size=0.20, random_state=random_state)

Part 3: Finally the perceptron!

Create the perceptron instance. This simply creates the model structure with the given hyper-parameters (options) I set some of the options of the perceptron: maximum number of iterations and learning step You also have options about regularisation to avoid overfitting (check it in the documentation of the perceptron)


In [7]:
# Options - hyperparameters
max_iter = 10
eta0= 0.1

# Create the perceptron instance. Again the random state (controls the random initialisation of weights) 
clf = Perceptron(max_iter = max_iter, eta0=eta0, random_state=random_state)

Train the perceptron on the training instances. In this case, the model uses both the example and the label to learn the weights. This is crucial but will not give any information about the generalization capacity of you model.


In [8]:
# Training
clf.fit(x_train, y_train)


Out[8]:
Perceptron(alpha=0.0001, class_weight=None, eta0=0.1, fit_intercept=True,
      max_iter=10, n_iter=None, n_jobs=1, penalty=None, random_state=0,
      shuffle=True, tol=None, verbose=0, warm_start=False)

Step 2 : Train the perceptron on the training instances. In this case, the model uses both the example and the label to learn the weights. This is crucial but will not give any information about the generalization capacity of you model. To evaluate the true efficiency of the classifier, I will use it to predict the labels of the test set, which was not use when the model was trained.


In [9]:
# Make prediction
y_pred = clf.predict(x_test)

Step 3: evaluate the performance in terms of accuracy. The accuracy is simply the proportion of labels that are correctly predicted by your model.


In [10]:
# Measure the performance using the accuracy score
from sklearn.metrics import accuracy_score
print("accuracy: {0:.2f}%".format(accuracy_score(y_test, y_pred)*100))


accuracy: 80.95%