This is a simple example using logistic regression to separate two groups of points in a 2d-plane. In full-disclosure, I've followed the example of logistic regression here.
In [1]:
using Plots
using Statistics: mean
group1 = [[0.067, 0.21], [0.092, 0.21],
[0.294, 0.445], [0.227, 0.521], [0.185, 0.597],
[0.185, 0.689], [0.235, 0.748], [0.319, 0.773],
[0.387, 0.739], [0.437, 0.672], [0.496, 0.739],
[0.571, 0.773], [0.639, 0.765], [0.765, 0.924],
[0.807, 0.933], [0.849, 0.941]]
group2 = [[0.118, 0.143], [0.118, 0.176],
[0.345, 0.378], [0.395, 0.319], [0.437, 0.261],
[0.496, 0.328], [0.546, 0.395], [0.605, 0.462],
[0.655, 0.529], [0.697, 0.597], [0.706, 0.664],
[0.681, 0.723], [0.849, 0.798], [0.857, 0.849],
[0.866, 0.899]]
group1, group2 = hcat(group1...), hcat(group2...);
In [2]:
plot(group1[1, :], group1[2, :], seriestype=:scatter, color = :red, label="group1", legend=:bottomright)
plot!(group2[1, :], group2[2, :], seriestype=:scatter, color = :green, label="group2")
Out[2]:
In [3]:
"""
logistic(X, W)
this performs a dot product between W (1 x m) and X (m x n) then the
elementwise logistic function. Note Julia is a column major language.
"""
function logistic(X, W)
reshape(1. ./ (1. .+ exp.(-(W * X))), :)
end
"""
crossentropy(yhat, y)
calculating the loss to make sure that our loss is going down during
gradient descent. This is the log of the likelihood. We use the log
for numerical stability reasons.
"""
function crossentropy(yhat, y)
-mean(y .* log.(yhat) .+ (1. .- y) .* log.(1. .- yhat))
end
"""
calcgrad(X, yhat, y, lr=0.01)
calculate the gradient of the loss. This is the first derivative of the
crossentropy loss with respect to the weight matrix.
"""
function calcgrad(X, yhat, y, lr=0.01)
N = size(yhat)[1]
gradients = X * (yhat .- y)
gradients /= N
gradients *= lr
reshape(gradients, 1, :)
end
"""
This is the likelihood function, but I didn't use it because we are
maximizing the log likelihood, which is equivalent to minimizing the
likelihood function.
"""
function liklihood(W, X, y)
h = logistic(X, W)
(h).^y .* (1. .- h).^(1. .- y)
end
Out[3]:
We are optimizing the weight matrix using gradient descent. This is a process that calculates the weights contribution to the difference between the probability of the binary prediction and the actual label. Upon each iteration we do the following steps:
There are also a few checks for reporting the loss and some early stopping criteria
In [4]:
n1, n2 = size(group1)[end], size(group2)[end]
data = hcat(vcat(ones(1, n1), group1), vcat(ones(1, n2), group2))
labels = vcat(ones(n1), zeros(n2))
W = reshape(rand(3), 1, :) # bias, feature 1, feature 2
@show size(W), size(data), size(labels)
i = 1
while true
i += 1
probs = logistic(data, W)
yhat = ifelse.(probs .> .5, 1., 0.)
grads = calcgrad(data, probs, labels)
if i % 1000 == 0
@show W
@show crossentropy(probs, labels), grads
end
if sum(abs.(grads)) < 5e-6
@show W
@show crossentropy(probs, labels), grads
break
end
W -= grads
if i == 10000
break
end
end
In [5]:
"""
linear(X, W)
given X = [1., x_1, x_2], we want to draw the separator line across the
span of our graph (roughly the interval [0, 1]). We can transform the formula
y = mx + b to:
y = w_2 * x_1 + w_3 * x_2 + w_1 * 1. to:
x_2 = (w_2 * x_1 + w_1) / -w_3
"""
function linear(X, W)
# y = mx + b --- (W[1] + border_input * W[2]) / W[3] = y
# y = w_1 + w_2 * x_1 + w_3 * x_2 -> x_2 = (w_1 + w_2 * x_1 - y) / -w_3
(W[1] .+ W[2] .* X) ./ -W[3]
end
border_x = 0.01 .* (0:100)
border_y = linear(border_x, W)
println("Weights: ", W)
plot(group1[1, :], group1[2, :], seriestype=:scatter, color = :red, label="group1", legend=:bottomright)
plot!(group2[1, :], group2[2, :], seriestype=:scatter, color = :green, label="group2")
plot!(border_x, border_y, label="border")
Out[5]:
In [6]:
probs = logistic(data, W)
plot(1:n1, probs[1:n1], seriestype=:scatter, color = :red, label="group1")
plot!(n1+1:length(probs), probs[n1+1:end], seriestype=:scatter, color = :green, label="group2")
hline!([.5], color = :green, label="border")
Out[6]:
A heart divided? The thin line between love and hate? All you need is love? I'll leave you to choose which cheesy saying you want to go with. As I've shown above, the data is easily separable with a logistic regression classifier. One of the improvements that I could have make is to add weight regularization. The current model suffers from the weights exploding because the final equal relies on a ratio of the weights between the two features in each point. To combat this, I could use weight regularization) in my model. The most commonly used regularizers are L1 (ridge) or L2 (lasso) regularizer and a common use of an L1 regularizer is as a feature selector. Ah, but of course, I have not answered the question
"determine an equation for a linear separator for these two groups of points"
Given weights of [-0.388722 -3.13332 3.38237] and a vector of 1s concatenated to the coordinates the equation for the linear separator is:
I chose to use Julia because I like learning new things. I felt like this was a simple enough project that I could use it to experiment with something new. If I were going for a more performant model I could have used sklearn in a few lines of code like so:
import numpy as np
from sklearn.linear_model import LogisticRegression
group1 = np.array(...)
group2 = np.array(...)
data = np.r_[group1, group2]
labels = np.r_[np.ones(group1.shape[0]), np.zeros(group2.shape[0])]
model = LogisticRegression().fit(data, labels)
model.coef_, model.intercept_
Lastly, I chose logistic regression because I think people like to see gradient descent in a non-neural network setting. I also thought about using a k-nearest neighbors (KNN) or support vector machine (SVM). But surprisingly, I read a KNN is not a linear classifier and writing the equation for an SVM is quite difficult.
In [ ]:
In [ ]:
In [ ]: