Regression Analyses in R

Vahid Mirjalili, Data Mining Researcer

  1. Linear Regression
  2. Ridge Regression
  3. Logistic Regression
  4. Kernel Regression
There are a dozen of regression models, that each may fit to a particular data. In this article, a handset of regression models are described with examples with **R**.

A 2D dataset is provided as drawn below,


In [2]:
## sessionInfo()

sessionInfo()


packageStartupMessage in packageStartupMessage(gettextf("Loading required package: %s", : Loading required package: RCurl

packageStartupMessage in packageStartupMessage(gettextf("Loading required package: %s", : Loading required package: bitops

Out[2]:
R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RCurl_1.95-4.1 bitops_1.0-6  

loaded via a namespace (and not attached):
[1] base64_1.1     digest_0.6.4   evaluate_0.5.5 IRdisplay_0.1  IRkernel_0.1  
[6] jsonlite_0.9.8 rzmq_0.7.0     stringr_0.6.2  uuid_0.1-1    

In [1]:
require(RCurl)
url <- getURL("https://raw.githubusercontent.com/mirjalil/DataScience/master/data/input_4regression-2.csv")
df <- read.csv(text=url, header=F)

#par(mar=c(4.7, 4.7, 0.7, 0.7))
#plot(df, col=rgb(0.2, 0.4, 0.8, 0.7), pch=19, cex=2, cex.axis=1.7, cex.lab=1.7)


packageStartupMessage in packageStartupMessage(gettextf("Loading required package: %s", : Loading required package: RCurl

packageStartupMessage in packageStartupMessage(gettextf("Loading required package: %s", : Loading required package: bitops

1. Linear Regression

</a>

Linear regression assumes that the relationship of dependant variable (y) and the predictors (features) is linear. The parameters of such linear model can be estimated via closed form solution of ordinary least square:

$ \beta^\hat \ = (X^t X)^{-1}(X^t y) $

R:

  • lm() to generate a linear model, given a formula such as formula=y ~ x
  • coef() returns the coefficients of the linear model built by lm()
  • resid() retyrns the residual of each point
  • predict() to get the predicted values

Here, we represent the applicaiton of linear regression on a sample dataset with 1 feature, then we extend the dataset by adding higher order polynomial terms, and build a second regression model.


In [33]:
colnames(df) = c("x", "y")

linear.model <- lm(formula=y ~ x, data=df)
coef(linear.model)

y.pred <- x*coef(linear.model)[2] + coef(linear.model)[1]
head(y.pred)

head(predict(linear.model))

library(lattice)

xyplot(y ~ x, df, pch=19, col=rgb(0.2, 0.4, 0.8, 0.7), cex=2,
       scales=list(cex=1.7),
       xlab=list("", cex=1.7), ylab=list("", cex=1.7),
       main=list("Linear Regression w. 1 Attribute", cex=1.6),
       panel = function(x,y, ...) {
            panel.xyplot(x, y, ...)
            llines(x, predict(linear.model), col="black", lwd=6)
            #llines(x, y.pred, col="black", lwd=6)
       })

#with(df, {
#     plot(x, y, pch=19, col=rgb(0.2, 0.4, 0.8, 0.7), cex=2, xlab="x", ylab="y", cex.axis=1.7, cex.lab=1.7);
#     #lines(x, predict(linear.model), col="black", lwd=6)
#     abline(linear.model, lwd=6)
#    })


Out[33]:
(Intercept)           x 
  1.7008612   0.5616422 
Out[33]:
numeric(0)
Out[33]:
         1          2          3          4          5          6 
-0.5457078 -0.4895436 -0.4333794 -0.3772151 -0.3210509 -0.2648867 
Out[33]:

Adding Higher Order Terms

The linear model built above is too simplisitic, in fact the model is under-fitted. So, to make a more accurate model, we add higher order polynomial terms such as $ x^2, x^3, x^4, ...\ x^{10} $.


In [3]:
##=============================#
## Adding higher order terms: ##

df$x2 <- df$x^2
df$x3 <- df$x^3
df$x4 <- df$x2^2
df$x5 <- df$x2 * df$x3
df$x6 <- df$x3^2
df$x7 <- df$x3 * df$x4
df$x8 <- df$x4^2
df$x9 <- df$x4 * df$x5
df$x10 <- df$x5^2

## Alternative way ==> poly(df$x, 10)

lm.xtend <- lm(formula=y ~ ., df)

coef(lm.xtend)


Out[3]:
  (Intercept)             x            x2            x3            x4 
 2.496688e-01  8.976677e-01  1.162846e+00 -2.673306e-01 -3.172852e-01 
           x5            x6            x7            x8            x9 
 6.663939e-02  3.919988e-02 -6.092869e-03 -2.128504e-03  1.810536e-04 
          x10 
 4.236224e-05 

In [4]:
##================================##
## Plotting the regression models ##

xyplot(y ~ x, df, pch=19, col=rgb(0.2, 0.4, 0.8, 0.7), cex=2,
       scales=list(cex=1.7),
       xlab=list("x", cex=1.7), ylab=list("y", cex=1.7),
       main=list("Linear Regression w. Polynomial Attributes", cex=1.6),
       panel = function(x,y, ...) {
            panel.xyplot(x, y, ...)
            llines(x, predict(linear.model), col="black", lwd=6, lty=2)
            llines(x, predict(lm.xtend), col="purple", lwd=6, lty=2)
       })


Out[4]:

2. Ridge Regression

</a>

Ridge regression is a regularized regression model, which penalizes the model complexity by sum square of model coefficients.

$ \beta^\hat \ = (X^t X + \lambda I)^{-1}(X^t y) $

where $ \lambda $ is the ridge constant, and controls the degree of penalizing model coefficients, with $ \lambda=0 $ yields the same result as linear regression.

Ridge Regression Practice in R:

  • Package ridge
  • Input data has to have at least 2 feature vectors (ndim(X)>=2)
  • linearRidge() to build a linear ridge regression model
  • predict() to get the predicted values, given a ridge model and data

In [5]:
library(ridge)

ridge.lin <- linearRidge(formula=y ~ ., df, lambda=0.1)
coef(ridge.lin)

with(df, {
    plot(df$x, df$y, pch=19, col=rgb(0.2, 0.4, 0.8, 0.7), cex=2, xlab="", ylab="", cex.axis=1.7)
    lines(x, predict(lm.xtend), col="purple", lwd=6, lty=3)
    lines(x, predict(ridge.lin), col="darkgreen", lwd=6, lty=2)
    legend(x=-4, y=5.5, lwd=6, col=c("purple", "darkgreen"), lty=c(3, 2), 
           legend=c("Polynomial Linear Regression", "Ridge Regression"),
           cex=1.7, border="white", box.lwd=0)
})

## Alternative plotting w. lattice package:
#xyplot(y ~ x, df, pch=19, col=rgb(0.2, 0.4, 0.8, 0.7), cex=2,
#       scales=list(cex=1.7),
#       xlab=list("x", cex=1.7), ylab=list("y", cex=1.7),
#       main=list("Linear Regression w. Polynomial Attributes", cex=1.6),
#       auto.key=T,
#       panel = function(x, y, ...) {
#            panel.xyplot(x, y, ...)
#            llines(x, predict(lm.xtend), col="purple", lwd=6, lty=3)
#            llines(x, predict(ridge.lin), col="darkgreen", lwd=6, lty=2)
#       })


Out[5]:
  (Intercept)             x            x2            x3            x4 
 9.658285e-01  4.241134e-01  1.207052e-01  1.339750e-02  1.730759e-03 
           x5            x6            x7            x8            x9 
 1.522917e-04  1.062585e-05 -1.540744e-05 -1.271680e-06 -1.605201e-06 
          x10 
-1.336680e-07 

3. Logistic Regression

</a>

Logistic regression is used for binary classification (in binomial logistic regression, and multiple classes in multinomial case). Logistic regression uses a logistic function (see below) that varies from [0-1] in a continous fashion, however, for classification, a proper threshold can be used to determine class label as "0" or "1".

Logistic Function:

$ f(x) = \frac{1}{1+\exp(-x)} $


In [6]:
t <- seq(-5, 5, by=0.2)
par(mar=c(4, 4, 1.5, 0.5))
plot(t, 1/(1+exp(-t)), type="l", lwd=4, col="blue",
     xlab="", ylab="", main="Logistic Function",
     cex.axis=1.7)


Logistic Regression in R:

  • glm() trains the logistic regression model
  • coef() returns the coefficitns of the model, or the decision boundary

For logistic regression, a new 2D dataset with class labels is provided here.


In [6]:
url <- getURL("https://raw.githubusercontent.com/mirjalil/DataScience/master/data/input_4regression-4.csv")
df.labeled <- read.csv(text=url, header=F, colClasses=c('numeric', 'numeric', 'factor'))
colnames(df.labeled) = c("x1", "x2", "y")

summary(df.labeled)

par(mar=c(4, 4, 0.4, 0.4))
plot(df.labeled$x1, df.labeled$x2, 
     main="",
     pch=19, col=as.numeric(df.labeled$y)+10, cex=2,
     xlab="", ylab="", cex.axis=1.7)


Out[6]:
       x1                x2           y      
 Min.   :-4.9698   Min.   :-4.99219   0:194  
 1st Qu.:-2.8755   1st Qu.:-1.69488   1:206  
 Median :-0.6838   Median : 0.09237          
 Mean   :-0.3113   Mean   : 0.27510          
 3rd Qu.: 2.4070   3rd Qu.: 2.68705          
 Max.   : 4.8855   Max.   : 4.99041          

In [7]:
logit.reg <- glm(formula=y ~ ., data=df.labeled, family="binomial")


cf.logit <- coef(logit.reg)
cf.logit


Out[7]:
(Intercept)          x1          x2 
0.007221144 0.251025660 0.597812737 

In [14]:
library(lattice)
t <- seq(-6, 6, by=0.2)
xyplot(x2~x1,type='p', data=df.labeled,
       pch=19, cex=2, xlab="", ylab="", col=as.numeric(df.labeled$y)+10,
       scales=list(cex=1.7),
       panel=function(x,y,...){
         panel.xyplot(x,y,...)
         y2 <- (-cf.logit[1] - cf.logit[2]*t)/cf.logit[3]
         x3 <- c(-6, t, 6)
         y3 <- c(-6, y2, -6)
         panel.polygon(x3, y3,col=rgb(0.2, 0, 0,0.1), border=NA)
       })


Out[14]:

Building a more complex decision boundary


In [17]:
df.labeled$x12 <- df.labeled$x1^2
df.labeled$x13 <- df.labeled$x1^3
df.labeled$x14 <- df.labeled$x1^4
#df.labeled$x14 <- df.labeled$x1^

logit.reg.poly <- glm(formula=y ~ ., data=df.labeled, family="binomial")


cf.logit.poly <- coef(logit.reg.poly)
cf.logit.poly

xyplot(x2~x1,type='p', data=df.labeled,
       pch=19, cex=2, xlab="", ylab="", col=as.numeric(df.labeled$y)+10,
       scales=list(cex=1.7),
       panel=function(x,y,...){
         panel.xyplot(x,y,...)
         y2 <- (-cf.logit.poly[1] - cf.logit.poly[2]*t - cf.logit.poly[4]*t^2 
                -cf.logit.poly[5]*t^3 -cf.logit.poly[6]*t^4)/cf.logit[3]
         x3 <- c(-6, t)
         y3 <- c(-6, y2)
         panel.polygon(x3, y3,col=rgb(0.2, 0, 0,0.1), border=NA)
       })


Out[17]:
  (Intercept)            x1            x2           x12           x13 
-0.2164771089 -1.7360739382  1.3708726927 -0.0027069186  0.1457319484 
          x14 
 0.0002446804 
Out[17]:


In [ ]: