Regularization


Regularization

Regularization is the technique used to solve the overfitting problem. An overfitted model means that the model can predict very well with the training data, but perform poorly with independent validation data.

When we add more predictors to our model, we will almost neccessarily decrease the Residual Sum of Squares (RSS; smaller RSS indicates better model). This increases the complexity of our model and makes our model only perform well on the training data (overfitting).

To balance the RSS and model overfitting, we introduce penalty for adding new predictors (coefficient $\beta \neq 0$) to the model.

LASSO regularization and Ridge regularization

  • LASSO: $min \{RSS + \lambda\sum_{j=1}^{p}|\beta_1|\}$
  • Ridge: $min \{RSS + \lambda\sum_{j=1}^{p}\beta^2_2\}$

Elastic Net regularization

Elastic net is a regularized method that linearly combines penalities of the lasso and ridge methods.

  • elastic net: $min \{RSS + \lambda[\sum_{j=1}^{p}\frac{1}{2}(1-\alpha)|\beta^2_2| + \alpha\sum_{j=1}^{p}\beta_1]\}$

Reference: https://spark.apache.org/docs/2.1.1/ml-classification-regression.html

regParam and elasticNetParam parameters in regression models

  • regParam: corresponds to $\lambda$
  • elasticNetParam corresponds to $\alpha$. When $\alpha = 0$, it is ridge regularization (L2 penalty). When $\alpha = 1$, it is lasso regularization (L1 penalty).

In [ ]: