In [1]:
library('readr') # data input
library('tibble') # data wrangling
library('reshape2') # data manipulation
library('dplyr') # data manipulation
library('ggplot2') # visualization
library('lubridate') # date and time
library('tseries') # time series analysis
library('forecast') # time series analysis
options(repr.plot.width=8, repr.plot.height=4)
In [2]:
df <- read_csv('../data/processed/df.csv')
In [3]:
head(df)
Let's train only on a row time series, like Pokemon ^^
In [4]:
input <- melt(df %>% select(-c(Page, agent, access, project, pagename)) %>% slice(1))
In [5]:
head(input)
In [6]:
input$Date = as.Date(input$variable)
In [7]:
ggplot(input, aes(Date, value)) + geom_line() + scale_x_date('month') + ylab("Visits") + xlab("Date")
Some big values here. R provides a convenient method for removing time series outliers: tsclean() as part of its forecast package. tsclean() identifies and replaces outliers using series smoothing and decomposition. This method is also capable of inputing missing values in the series if there are any.
Note that we are using the ts() command to create a time series object to pass to tsclean():
In [8]:
count_ts = ts(input[, c('value')])
input$visits_cleaned = tsclean(count_ts)
In [9]:
ggplot(input, aes(Date, visits_cleaned)) + geom_line() + scale_x_date('month') + ylab("Visits") + xlab("Date")
In [10]:
input$visits_ma = ma(input$visits_cleaned, order=7)
input$visits_ma30 = ma(input$visits_cleaned, order=30)
In [11]:
ggplot() +
geom_line(data = input, aes(x = Date, y = visits_cleaned, colour = "Counts")) +
geom_line(data = input, aes(x = Date, y = visits_ma, colour = "Weekly Moving Average")) +
geom_line(data = input, aes(x = Date, y = visits_ma30, colour = "Monthly Moving Average")) +
ylab('Visits Count')
In [ ]: