We build a dataset to illustrate our purpose.
The dataset we build is supposed to help us converting a paper scan into a ASCII string. Lets imagine that, when a paper is scaned, we can detect, with high confidence that we are over a colun or a semicolun. Our objective here is to detect wether it's one or the other.
Therefore, our algorithm is fed with a vector $x_i \in [0,1]^5$ which represent the intensity of the pen stroke writting on the paper.
Here below, you have an example of 'perfect' strokes for $x_1$ an example of colun, and $x_2$ an example of semicolun.
In [39]:
import matplotlib.pyplot as plt
import numpy as np
x_1 = np.array([0,1,0,1,0])
x_2 = np.array([0,1,0,1,1])
def to_img(vec):
matrix = np.ones((5, 3))
matrix[:, 1] = 1-vec
return matrix
fig, axs = plt.subplots(1,2)
axs[0].imshow(to_img(x_1), cmap='gray')
axs[1].imshow(to_img(x_2), cmap='gray')
plt.show()
Whenever a sample $x_i$ belongs to the class colun, we'll label it with $y_i=0$.
Likewise, whenever a sample $x_i$ belongs to the class semicolun, we'll label it with $y_i=1$.
In [40]:
y_1 = 0
y_2 = 1
In [41]:
X = np.array([
[.0, 1., 0., 1., 0.],
[.0, .9, 0., .9, 0.],
[.2, .8, 0., .8, .2],
[.0, 1., 0., 1., 1.],
[.0, 1., 0., .5, .5],
[.2, .8, 0., .7, .7]])
y = np.array([0,0,0,1,1,1])
fig, axs = plt.subplots(1,6)
for i in range(len(X)):
axs[i].imshow(to_img(X[i]), cmap='gray')
plt.show()
(You may want to read this : http://cs229.stanford.edu/notes/cs229-notes1.pdf).
You're going to build a model which outputs a prediction value $p_i$ given an input $x_i$. This prediction $p_i$ will reflect the propability that your input $x_i$ belongs to class 1. $$ \begin{align} p_i &= P(Y=1 | W, x_i) \\ p(x_i,W) &= P(Y=1 | W, x_i) \end{align} $$
As $p_i$ is a probability, it must be in [0,1].
The model we'll consider perform a weighted sum of its input:
And then squizes the values between 0 and 1 (which is our prediction value):
In [42]:
# Write the code ^^
Overall, we would like to maximize the likelihood that we are right at predicting a label.
$$ \begin{align} \max \text{likelihood} &= \text{argmax}_w \Pi_i P(Y | W, x_i) \\ &= \text{argmax}_w \Pi_i \big( P(Y=y_i | W, x_i) \big) \\ &= \text{argmax}_w \Pi_i \big( P(Y=1 | W, x_i)^{y_i} \cdot P(Y=0 | W, x_i)^{1-y_i} \big) \\ &= \text{argmax}_w \Pi_i \big( P(Y=1 | W, x_i)^{y_i} \cdot 1-P(Y=1 | W, x_i)^{1-y_i}\big) \\ &= \text{argmax}_w \Pi_i \big( p_i^{y_i} \cdot 1-p_i^{1-y_i}\big) \\ &= \text{argmax}_w \sum_i \big( y_i \ln(p_i) + (1-y_i) \ln(1-p_i) \big) \\ &= \text{argmin}_w - \sum_i \big( y_i \ln(p_i) + (1-y_i) \ln(1-p_i) \big) \\ \end{align} $$And this term is going to be our loss that we want to reduce: $$ L(x_i, W, y_i) = - \sum_i \big( y_i \ln(p_i) + (1-y_i) \ln(1-p_i) \big) $$ This is how you compare the prediction you made ($p_i$) to the true output you expected ($y_i$).
In means of colun and semicolun : remember $x_0$, it's a colun, therefore it's label is $y_0=0$.
If your classifier is good you'ld expect it to predict it's a semicolun, hense have $p_i = $"Something small like 0.1".
The error for this one sample is going to be: $$ \begin{align} L(X, W, y) &= - \sum_i \big( y_i \ln(p_i) + (1-y_i) \ln(1-p_i) \big) \\ &= y_0 \ln(p_0) + (1-y_0) \ln(1-p_0) \\ &= - 0 \ln(.9) + (1-0) \ln(1-.9) \\ &= - \ln(.1) \end{align} $$
In [43]:
# Pen, paper, and your code ^^
In [ ]:
# Code