Logistic regression with $\ell_1$ regularization

In this example, we use CVXPY to train a logistic regression classifier with $\ell_1$ regularization. We are given data $(x_i,y_i)$, $i=1,\ldots, m$. The $x_i \in {\bf R}^n$ are feature vectors, while the $y_i \in \{0, 1\}$ are associated boolean classes.

Our goal is to construct a linear classifier $\hat y = \mathbb{1}[\beta^T x > 0]$, which is $1$ when $\beta^T x$ is positive and $0$ otherwise. We model the posterior probabilities of the classes given the data linearly, with

$$ \log \frac{\mathrm{Pr} (Y=1 \mid X = x)}{\mathrm{Pr} (Y=0 \mid X = x)} = \beta^T x. $$

This implies that

$$ \mathrm{Pr} (Y=1 \mid X = x) = \frac{\exp(\beta^T x)}{1 + \exp(\beta^T x)}, \quad \mathrm{Pr} (Y=0 \mid X = x) = \frac{1}{1 + \exp(\beta^T x)}. $$

We fit $\beta$ by maximizing the log-likelihood of the data, plus a regularization term $\lambda \|\beta\|_1$ with $\lambda > 0$:

$$ \ell(\beta) = \sum_{i=1}^{m} y_i \beta^T x_i - \log(1 + \exp (\beta^T x_i)) - \lambda \|\beta\|_1. $$

Because $\ell$ is a concave function of $\beta$, this is a convex optimization problem.


In [1]:
import cvxpy as cp
import numpy as np
import matplotlib.pyplot as plt

In the following code we generate data with $n=50$ features by randomly choosing $x_i$ and supplying a sparse $\beta_{\mathrm{true}} \in {\bf R}^n$. We then set $y_i = \mathbb{1}[\beta_{\mathrm{true}}^T x_i + z_i > 0]$, where the $z_i$ are i.i.d. normal random variables. We divide the data into training and test sets with $m=50$ examples each.


In [2]:
np.random.seed(1)
n = 50
m = 50
def sigmoid(z):
  return 1/(1 + np.exp(-z))

beta_true = np.array([1, 0.5, -0.5] + [0]*(n - 3))
X = (np.random.random((m, n)) - 0.5)*10
Y = np.round(sigmoid(X @ beta_true + np.random.randn(m)*0.5))

X_test = (np.random.random((2*m, n)) - 0.5)*10
Y_test = np.round(sigmoid(X_test @ beta_true + np.random.randn(2*m)*0.5))

We next formulate the optimization problem using CVXPY.


In [3]:
beta = cp.Variable(n)
lambd = cp.Parameter(nonneg=True)
log_likelihood = cp.sum(
    cp.multiply(Y, X @ beta) - cp.logistic(X @ beta) 
)
problem = cp.Problem(cp.Maximize(log_likelihood/n - lambd * cp.norm(beta, 1)))

We solve the optimization problem for a range of $\lambda$ to compute a trade-off curve. We then plot the train and test error over the trade-off curve. A reasonable choice of $\lambda$ is the value that minimizes the test error.


In [4]:
def error(scores, labels):
  scores[scores > 0] = 1
  scores[scores <= 0] = 0
  return np.sum(np.abs(scores - labels)) / float(np.size(labels))

In [5]:
trials = 100
train_error = np.zeros(trials)
test_error = np.zeros(trials)
lambda_vals = np.logspace(-2, 0, trials)
beta_vals = []
for i in range(trials):
    lambd.value = lambda_vals[i]
    problem.solve()
    train_error[i] = error( (X @ beta).value, Y)
    test_error[i] = error( (X_test @ beta).value, Y_test)
    beta_vals.append(beta.value)

In [6]:
%matplotlib inline
%config InlineBackend.figure_format = "svg"

plt.plot(lambda_vals, train_error, label="Train error")
plt.plot(lambda_vals, test_error, label="Test error")
plt.xscale("log")
plt.legend(loc="upper left")
plt.xlabel(r"$\lambda$", fontsize=16)
plt.show()


We also plot the regularization path, or the $\beta_i$ versus $\lambda$. Notice that a few features remain non-zero longer for larger $\lambda$ than the rest, which suggests that these features are the most important.


In [7]:
for i in range(n):
    plt.plot(lambda_vals, [wi for wi in beta_vals])
plt.xlabel(r"$\lambda$", fontsize=16)
plt.xscale("log")


We plot the true $\beta$ versus reconstructed $\beta$, as chosen to minimize error on the test set. The non-zero coefficients are reconstructed with good accuracy. There are a few values in the reconstructed $\beta$ that are non-zero but should be zero.


In [8]:
idx = np.argmin(test_error)
plt.plot(beta_true, label=r"True $\beta$")
plt.plot(beta_vals[idx], label=r"Reconstructed $\beta$")
plt.xlabel(r"$i$", fontsize=16)
plt.ylabel(r"$\beta_i$", fontsize=16)
plt.legend(loc="upper right")


Out[8]:
<matplotlib.legend.Legend at 0x108adedd8>