Logistic Regression is a linear model for classificiation rather than regression. Also known as logit expression, maximum-entropy classfication (MaxEnt) or the log-linear classifier.

In scikit-learn, the implementation of Logistic Regression can be accessed from class LogisticRegression.

Formulation

Binary class L2 penalized logistic regression minimizes the following cost function:

$min_{w,c} \frac{1}{2}w^T w + C \sum _{i=1}^{n}log(exp(-y_i(X_i^T w + c)) + 1)$

L1 regularized logistic regression solves the following optimization problem:

$min_{w,c} ||w||_1 + C \sum _{i=1}^{n}log(exp(-y_i(X_i^T w + c)) + 1)$

The solver implemented in the class LogisticRegression are "liblinear", "newton-cg", "lbfgs", and "sag"

One way to choose solver:

    Case                               Solver
    Small dataset or L1 penalty        "liblinear"
    Multinomial loss                   "lbfgs" or "newton-cg"
    Large dataset                      "sag"

Cross validation

LogisticRegressionCV implements Logistic Regression with builtin cross-validation to find out the optimal C parameter.

"newton-cg", "sag" and "lbfgs" solvers are found to be faster for high-dimensional dense data, due to its warm-starting.

Questions


In [ ]: