Load the Libraries
In [1]:
library(ggplot2)
library(dplyr)
library("ggrepel")
Read the data
In [2]:
nhldata <-read.csv("dataSets/NHLTop100.csv",header=T)
Show the data
In [3]:
head(nhldata)
Print the column Names
In [4]:
colnames(nhldata)
Change the columns Names as per the need
Begin
In [5]:
names(nhldata)[names(nhldata)=="X..."] <- "PlusMinus"
In [6]:
names(nhldata)[names(nhldata)=="X1st.NHL.Season"] <- "1st.NHL.Season"
In [7]:
names(nhldata)[names(nhldata)=="G"] <- "Goals"
In [8]:
names(nhldata)[names(nhldata)=="A"] <- "Assists"
End
Show the names of the columns
In [9]:
names(nhldata)
Summarize the data
In [10]:
summary(nhldata)
In [11]:
colnames(nhldata)
In [12]:
nhldata = nhldata %>% select(Player,GP,Goals,Assists)
Show the data
In [13]:
head(nhldata)
Simply plot the above data to inspect the data
In [14]:
plot(nhldata$Goals,nhldata$Assists, pch=19, xlim=c(0,1000), ylim=c(0,2100),xlab="Goals", ylab="Assists",main="NHL Data Goals vs Assists")
Create a Linear Regression Model between Assists and Goals field of the data
In [15]:
modelDefault = lm(Assists~Goals,data=nhldata)
Summarize the model
In [16]:
summary(modelDefault)
Draw the linear regression line obtained from above model
In [17]:
plot(nhldata$Goals,nhldata$Assists, pch=19, xlim=c(0,1000), ylim=c(0,2100),xlab="Goals", ylab="Assists",main="Default Model")
abline(modelDefault, col="black")
Create a Linear Regression Model between Assists and Goals field of the data
Adding a +0 in the lm function makes sure that the linear regression line pass thru origin
In [18]:
modelDefaultThruOrigin = lm(Assists~Goals+0,data=nhldata)
Show the summary of the above model
In [19]:
summary(modelDefaultThruOrigin)
Draw the linear regression line obtained from above model
In [20]:
plot(nhldata$Goals,nhldata$Assists, pch=19, xlim=c(0,1000), ylim=c(0,2100),xlab="Goals", ylab="Assists",main="Default model which passes thru origin")
abline(modelDefaultThruOrigin, col="black")
Create the model
If (x0,y0) is the point through which the regression line must pass, fit the model y−y0=β(x−x0)+ε, i.e., a linear regression with "no intercept" on a translated data set.
In [21]:
modelWayne = lm(I(Assists-1963)~I(Goals-894)+0,data=nhldata)
Summarize the above model
In [22]:
summary(modelWayne)
Draw the linear regression line obtained from above model
In [23]:
plot(nhldata$Goals,nhldata$Assists, pch=19, xlim=c(0,1000), ylim=c(0,2100),xlab="Goals", ylab="Assists",main="Model Passing Thru Wayne Gretzky")
abline(predict(modelWayne, newdata = list(Goals=0))+1963, coef(modelWayne), col='red')
abline(modelDefault, col="black")
text(x=894, y=1963, labels="Wayne Gretzky",cex= 1,pos=2)
Create the linear regression model
By hit and trial we found out that adding 365 to the Assists and +0 to the model will make the model pass thru origin and Wayne Gretzky
In [24]:
modelWayneThruOrigin = lm(I(Assists+365)~I(Goals)+0,data=nhldata)
Summarize the above model
In [25]:
summary(modelWayneThruOrigin)
Draw the linear regression line obtained from above model
In [26]:
plot(nhldata$Goals,nhldata$Assists, pch=19, xlim=c(0,1000), ylim=c(0,2100),xlab="Goals", ylab="Assists",main="Model Passing Thru Wayne Gretzky and Origin")
abline(modelWayneThruOrigin, col="red")
text(x=894, y=1963, labels="Wayne Gretzky",cex= 1,pos=2)
Read the data from the nhl website about Patrick Kane and add it to our data
In [27]:
patrickKane = data.frame(Player = "Patrick Kane", GP = 735, Goals = 285, Assists = 462)
Adding the data for Patrick Kane to the data
In [28]:
nhldata = rbind(nhldata,patrickKane)
Checking if the data has been added
In [29]:
nrow(nhldata)
Create the model which passes thru Patrick Kane
If (x0,y0) is the point through which the regression line must pass, fit the model y−y0=β(x−x0)+ε, i.e., a linear regression with "no intercept" on a translated data set. I
In [30]:
modelPatrick <- (lm(I(Assists-458)~I(Goals-282)+0, data = nhldata))
Summarize the above model
In [31]:
summary(modelPatrick)
Draw the linear regression line obtained from above model
In [32]:
plot(nhldata$Goals,nhldata$Assists, pch=19, xlim=c(0,1000), ylim=c(0,2100),xlab="Goals", ylab="Assists",main="Model Passing Thru Patrick Kane")
abline(predict(modelPatrick, newdata = list(Goals=0))+458, coef(modelPatrick), col='green')
abline(predict(modelWayne, newdata = list(Goals=0))+1963, coef(modelWayne), col='red')
abline(modelDefault, col="black")
text(x=894, y=1963, labels="Wayne Gretzky",cex= 1,pos=2)
text(x=282, y=458, labels="Patrick Kane",cex= 1,pos=2)
Create the LM model
By hit and trial we found out that adding 75 to the Assists and +0 to the model will make the model pass thru origin and Patrick Kane
In [33]:
modelPatrickThruOrigin <- (lm(I(Assists+75)~I(Goals)+0, data = nhldata))
Draw the linear regression line obtained from above model
In [34]:
plot(nhldata$Goals,nhldata$Assists, pch=19, xlim=c(0,1000), ylim=c(0,2100),xlab="Goals", ylab="Assists",main="Model Passing Thru Patrick Kane and Origin")
abline(modelPatrickThruOrigin, col="green")
text(x=282, y=458, labels="Patrick Kane",cex= 1,pos=2)
summarize the data for above model
In [35]:
summary(modelPatrickThruOrigin)
Create a function to get the P-Value from a given model
In [36]:
getPValue <- function (modelobject) {
if (class(modelobject) != "lm") stop("Not an object of class 'lm' ")
f <- summary(modelobject)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)
}
Create a function to get a dataframe containg the R Squared Value and P Value for a given model
In [37]:
getValues = function(model,modelNameStr){
rSquaredVal = summary(model)$r.squared
pVal = getPValue(model)
tempSum = data.frame(modelName = modelNameStr, R.Squared.Value = rSquaredVal, pValue = pVal)
return (tempSum)
}
Create a data frame to hold the data for different models
In [38]:
modelSummary = data.frame(modelName = character(), R.Squared.Value = double(), pValue = double())
Populate the data frame with all the models
In [39]:
modelSummary = rbind(modelSummary,getValues(modelDefault,"Default Model"))
modelSummary = rbind(modelSummary,getValues(modelDefaultThruOrigin,"Default Model via Origin"))
modelSummary = rbind(modelSummary,getValues(modelWayne,"Model Via Wayne Gretzky"))
modelSummary = rbind(modelSummary,getValues(modelWayneThruOrigin,"Model Via Wayne Gretzky and Origin"))
modelSummary = rbind(modelSummary,getValues(modelPatrick,"Model Via Patrick Kane"))
modelSummary = rbind(modelSummary,getValues(modelPatrickThruOrigin,"Model Via Patrick Kane and Origin"))
Show the data in for various models
In [40]:
modelSummary
In [ ]: