In [1]:
library(data.table)

In [2]:
library(ggplot2)

In [3]:
library(dplyr)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:data.table’:

    between, first, last

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


In [4]:
options(scipen=999)

Sampling registered user revisions to plot out time difference between edits


In [5]:
sample_human_revision_session_data <- data.table(read.table("~/Desktop/human_events.tsv", header=TRUE, sep="\t"))

In [6]:
sample_human_revision_session_data$updated_timestamp <- as.POSIXct(as.character(sample_human_revision_session_data$timestamp), format='%Y%m%d%H%M%S', origin='1970-01-01')

In [7]:
sample_human_revision_session_data$updated_previous_timestamp <- as.POSIXct(as.character(sample_human_revision_session_data$prev_timestamp), format='%Y%m%d%H%M%S', origin='1970-01-01')

In [8]:
sample_human_revision_session_data$time_difference <- as.numeric(sample_human_revision_session_data$updated_timestamp - sample_human_revision_session_data$updated_previous_timestamp)

In [9]:
sample_human_revision_session_data$log_time_difference <- log10(sample_human_revision_session_data$time_difference + 1)


Warning message in eval(expr, envir, enclos):
“NaNs produced”

In [10]:
attach(sample_human_revision_session_data)

In [20]:
sample_human_revision_session_data_mean = summarize(group_by(sample_human_revision_session_data[prev_timestamp != 'NULL' & session_events >= 10 & time_difference >= 0,], user, session_start), edit_in_session = n(), mean_time_difference = mean(time_difference))

In [21]:
head(sample_human_revision_session_data_mean)


usersession_startedit_in_sessionmean_time_difference
1 2014072715103412 117.2500
1 20141122224009 9 391.4444
1 2015030414105323 963.2174
1 2015030423160416 832.3125
1 2015030523112919 485.1579
1 2015030820393425 422.0000

In [22]:
sample_human_revision_session_data_mean[mean_time_difference < 10, group := "group1"]


Error in `:=`(group, "group1"): Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").
Traceback:

1. sample_human_revision_session_data_mean[mean_time_difference < 
 .     10, `:=`(group, "group1")]
2. `[.grouped_df`(sample_human_revision_session_data_mean, mean_time_difference < 
 .     10, `:=`(group, "group1"))
3. NextMethod()
4. `[.tbl_df`(sample_human_revision_session_data_mean, mean_time_difference < 
 .     10, `:=`(group, "group1"))
5. check_names_df(j, x)
6. `:=`(group, "group1")
7. stop("Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(\":=\").")

In [ ]: