We will now implement the perceptron classifier. We put ourselves in a setting where we have access to training examples $\boldsymbol{x}_i$ and each of them is associated with a target $y\in{-1,1}$.
The perceptron classifier is a simple model that consists of a single neuron with a step activation function (also known as a heavyside step function).
One can visualize it as follows:
If you have questions or comments : charlotte[dot]laclau[at]univ-grenoble-alpes[dot]fr
Given an n-dimensional input $x\in\mathbb{R}^n$ and taking into account the bias unit, one moves from the input to the output with two steps:
1 & s\geq 0 \\
-1 & s < 0
\end{cases}
$We are in the standard supervised classification setting, where one has $k$ examples $\vec{X}={\vec{x_1}, \ldots, \vec{x_k}}$, where $\vec{x_k} \in \mathbb{R}^n$. Each $\vec{x}_k \in \vec{X}$ is associated with a category $y_k \in \mathbb{Y}$, from a pre-defined set of categories. In the binary case $\mathbb{Y}=\{-1, +1\}$.
We then, want to learn a vector $\vec{w} \in \mathbb{R}^{n+1}$, to perform the above described classification step. For the weight vector $\vec{w}$ we moved from $n$ to $n+1$ dimensions to account for the bias unit. Using the perceptron algorithm we want to minimize the number of examples we misclassify, and essentially if the examples are linearly separable, misclassify nothing. Hence, one can define a simple loss function:
$\mathbb{L} = -\sum\limits_{k} y_k(\vec{w}\cdot\vec{x_k})$
In the online case, where one updates the weights for a single instance $k$, this becomes: $\mathbb{L} = - y_k(\vec{w}\cdot\vec{x_k})$, where L
To change the direction of $\vec{w}$ when we misclassify, we can:
$\nabla L = \frac{\partial L}{\partial w}= - y_k x_k$
We scale this update using the learning rate $\eta$ and we update by taking a step towards the negative of the gradient:
$w_k^{(t+1)} = w_k^{(t)} + \eta*x_k*y_k$
TBD:
Pseudo-code :
input: X, Y, eta, w, n
for i in 1:n
pick an example randomly
result <- w*x
if (result<0)
result=0
else
result =1
error <- Y - result
w <- w + eta*error*x
(indication: the if statement could be defined as a function beforehand in a step function)
We are now going to work on the sonar.txt dataset.
Use the perceptron classifier available in sckit-learn. The function presents a lot of options that you should explore.
Repeat the same operations but using K-folds instead of one train and one test file.
If you feel comfortable with Python, code the perceptron classifier
You may want to ckeck: np.array
, numpy.random.rand
, numpy.dot
, random.choice
In [1]:
import random, numpy as np, matplotlib.pyplot as plt, time
%matplotlib inline
In [2]:
# Training data for the first question
training_data = [
(np.array([0,0,1]), 0),
(np.array([0,1,1]), 1),
(np.array([1,0,1]), 1),
(np.array([1,1,1]), 1),
]
def unit_step(value):
if value < 0:
return 0
else:
return 1
# or unit_step = lambda x: 0 if x <0 else 1
n = 20
eta = 0.2
errors = []
w = np.random.rand(3)
for i in range(n):
x, expected = random.choice(training_data)
result = np.dot(w,x)
error = expected - unit_step(result)
w += eta*error*x
errors.append(error)
for x, _ in training_data:
result = np.dot(x, w)
print("{}: {} -> {}".format(x[:2], result, unit_step(result)))
In [3]:
# Part 1
import sklearn as sk
from sklearn.linear_model import Perceptron
import pandas as pd
# Load dataset
sonar = pd.read_table('sonar.txt', header = None, delimiter=',')
sonar.head()
# In case of missing values, you can remove NaN elements using dropna function
# sonar = sonar.dropna(how="any", axis=0)
Out[3]:
Before splitting the data into the test and the train set, you should check if you need to normalise the data. Usually, it is necessary if the scales of the variables are two different from one another. For the sonar data, all variables have values between 0 and 1, so it's fine!
In [5]:
# Import the function for splitting the data from sklearn
from sklearn.model_selection import train_test_split
#Separate data from label. To access elements of a dataframe using the position, use .loc
x = sonar.loc[:,range(60)]
target = sonar.loc[:,60]
# Use unique function to describe the possible labels
print("Unique labels: {0}".format(np.unique(target)))
Split the data: random_state allow you to control the randomness of your training and test set. Here I set it to 0 (it could be any integer of your choice) By doing so, everytime that I run de following lines, if random_state is at 0, I will create the same train and test.
In [6]:
random_state = 0
# test_size indicates the proportion of the instances which are used of the test set
x_train, x_test, y_train, y_test = train_test_split(x, target, test_size=0.20, random_state=random_state)
Create the perceptron instance. This simply creates the model structure with the given hyper-parameters (options) I set some of the options of the perceptron: maximum number of iterations and learning step You also have options about regularisation to avoid overfitting (check it in the documentation of the perceptron)
In [7]:
# Options - hyperparameters
max_iter = 10
eta0= 0.1
# Create the perceptron instance. Again the random state (controls the random initialisation of weights)
clf = Perceptron(max_iter = max_iter, eta0=eta0, random_state=random_state)
Train the perceptron on the training instances. In this case, the model uses both the example and the label to learn the weights. This is crucial but will not give any information about the generalization capacity of you model.
In [8]:
# Training
clf.fit(x_train, y_train)
Out[8]:
Step 2 : Train the perceptron on the training instances. In this case, the model uses both the example and the label to learn the weights. This is crucial but will not give any information about the generalization capacity of you model. To evaluate the true efficiency of the classifier, I will use it to predict the labels of the test set, which was not use when the model was trained.
In [9]:
# Make prediction
y_pred = clf.predict(x_test)
Step 3: evaluate the performance in terms of accuracy. The accuracy is simply the proportion of labels that are correctly predicted by your model.
In [10]:
# Measure the performance using the accuracy score
from sklearn.metrics import accuracy_score
print("accuracy: {0:.2f}%".format(accuracy_score(y_test, y_pred)*100))