This is the R version of The LFC Goal machine. For a full description of the project see the lfcgm github repository. Note that the R version files are in the Rversion folder.
The notebook describes how the app was converted to run as an RShiny app.
The project uses Jupyter Notebook, R, R ggplot, R studio, R shiny, R dplyr.
Last updated: 14th October 2017
Version: 2.2.0
Change Description:
The initial aim was simply to convert the web app that was developed using python, spyre and ggplot to an equivalent version using R, shiny and ggplot.
This was very straight-forward with Rshiny offering a richer UI than python spyre. The R files for the equivalent version are in the lfcgmRv1 folder. Note that R's ggplot handles a player with 2 datapoints differently from python's ggplot. THerefore a tweak was required to the R ggplot function to cope with players with only 2 data points.
The Rversion folder contains the latest R version with an enhanced (simplified) user interface. This takes advantage of Rshiny's selectizeInput with multiple inputs.
In [ ]:
Import the R libraries needed for the analysis.
In [35]:
library(ggplot2)
library(dplyr)
In [36]:
dflfcgm <- read.csv('data/lfc_scorers_tl_pos_age.csv', header=TRUE)
In [38]:
# check demension of dataframe, expect 1210 rows
dim(dflfcgm)
Out[38]:
In [39]:
# check head includes latest season: 2016-17, top scorer is Phil Coutinho
head(dflfcgm)
Out[39]:
In [40]:
# check column names
colnames(dflfcgm)
Out[40]:
In [41]:
# drop column X (the original row index from the file)
drops <- c('X')
dflfcgm <- dflfcgm[ , !(names(dflfcgm) %in% drops)]
In [42]:
# check dimensions, should be 5 columns
dim(dflfcgm)
Out[42]:
In [43]:
head(dflfcgm)
Out[43]:
In [44]:
# check tail, should show 1894-95
tail(dflfcgm)
Out[44]:
In [45]:
summary(dflfcgm)
str(dflfcgm)
Out[45]:
In [46]:
dflfcgm_dd <- read.csv('data/lfcgm_app_dropdown.csv', header=TRUE)
In [47]:
# check dimensions, expect 240 rows
dim(dflfcgm_dd)
Out[47]:
In [13]:
# check head, expect Adam Lallana at top (added in 2015-16)
head(dflfcgm_dd)
Out[13]:
In [62]:
# check dataframe includes new additions e.g. Divock Origi added 2016-17
dflfcgm_dd[dflfcgm_dd$value %in% c("Divock Origi", "Roberto Firmino","James Milner"),]
Out[62]:
In [ ]:
In [57]:
# start with a basic ggplot plot
# create a filter of the dataframe and produce a plot of the data points
df <- dflfcgm[dflfcgm$player %in% c('Luis Suarez', 'Steven Gerrard'),]
head(df)
ggplot(df, aes(x=age, y=league, color=player, shape=player)) + geom_point() + ggtitle('test plot: StevieG and Suarez')
Out[57]:
In [15]:
# add the line of best fit
# create a filter of the dataframe and produce a plot of the data points
df <- dflfcgm[dflfcgm$player %in% c('Luis Suarez', 'Steven Gerrard'),]
ggplot(df, aes(x=age, y=league, color=player, shape=player)) +
geom_point() +
geom_smooth(se=FALSE) +
ggtitle('test plot: StevieG and Suarez')
In [16]:
ggplot_age_vs_lgoals <- function(df, players) {
# Return ggplot of Age vs League Goals for given players in dataframe.
#
# Given the low number of points, ggplot's geom_smooth uses
# the loess method with default span.
TITLE <- 'LFCGM Age vs League Goals'
XLABEL <- 'Age at Midpoint of Season'
YLABEL <- 'League Goals per Season'
EXEMPLAR_PLAYERS <- c('Ian Rush', 'Kenny Dalglish', 'Roger Hunt', 'David Johnson',
'Harry Chambers', 'John Toshack', 'John Barnes', 'Kevin Keegan')
EXEMPLAR_TITLE <- 'LFCGM Example Plot, The Champions: Age vs League Goals'
# if players vector is empty then set the default exemplar options
if (length(players) == 0) {
players <- EXEMPLAR_PLAYERS
TITLE <- EXEMPLAR_TITLE
} else {
title <- TITLE
}
# create dataframes to plot...
# filter those players with only 2 points and those with more than 2
this_df <- df[df$player %in% players, ]
this_dfeq2 <- this_df %>% group_by(player) %>% filter(n()==2)
this_dfgt2 <- this_df %>% group_by(player) %>% filter(n()>2)
# produce the plot and return it
this_plot <- ggplot(this_df, aes(x=age, y=league, color=player, shape=player)) +
geom_point(size=2) +
geom_line(data=this_dfeq2, size=0.1) +
geom_smooth(data=this_dfgt2, se=FALSE, size=0.1) +
xlab(XLABEL) +
ylab(YLABEL) +
ggtitle(TITLE) +
scale_shape_manual(values=0:length(players)) +
theme(legend.text=element_text(size=10))
return (this_plot)
}
Ref: http://www.lfcsorted.com/2016/03/the-lfc-goal-machine-graphic-detail.html
In [17]:
# show default plot
players = c()
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
suppressWarnings(print(plt))
In [18]:
# produce plot for player known as 'god'
players = c('Robbie Fowler')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
suppressWarnings(print(plt))
In [19]:
# check plot for a player with only 2 points
dflfcgm[dflfcgm$player == 'Andy Carroll',]
players = c('Andy Carroll')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
suppressWarnings(print(plt))
Out[19]:
In [20]:
# show all players scoring more than 20 goals when over 30 years old
df_late <- dflfcgm[(dflfcgm$league >= 20) &
(dflfcgm$age > 30),]
df_late
players = df_late$player
players
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
suppressWarnings(print(plt))
Out[20]:
Out[20]:
In [21]:
# plot second of TOP_DUO seasons
players = c('Daniel Sturridge', 'Luis Suarez')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
suppressWarnings(print(plt))
In [63]:
# check plot for a recent player
players = c('Roberto Firmino')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
suppressWarnings(print(plt))
In [66]:
dim(dflfcgm_dd)
Out[66]:
In [67]:
str(dflfcgm_dd)
In [68]:
head(dflfcgm_dd)
Out[68]:
In [69]:
tail(dflfcgm_dd)
Out[69]:
In [70]:
length(dflfcgm_dd$value)
Out[70]:
In [71]:
dd_players = dflfcgm_dd$value
In [72]:
dd_players[1:5]
Out[72]:
In [73]:
class(dd_players)
Out[73]:
In [74]:
# create vector of strings containing the list of players for the input dropdowns
dd_p <- levels(dd_players)[1:5]
In [75]:
class(dd_p)
Out[75]:
In [76]:
dd_p
Out[76]:
In [77]:
# add default 'empty' value to beginning of the player dropdown list
EMPTY = '<Select Player>'
print(EMPTY)
dd_p = c(EMPTY, dd_p)
print(dd_p)
In [78]:
# investigate generating the dropdown list using lapply
p <- c()
lapply(1:8, function(i) {
p <- c(p, paste0('dd', i))
})
Out[78]:
In [ ]: