Exploring the pages with the most reverts per page

Load the English Wikipedia table of bot-bot reverts, process dates:


In [1]:
library(data.table)
library(ggplot2)

dt = data.table(read.table("../../datasets/reverted_bot2bot/enwiki_20170420.tsv.bz2", sep="\t", header=T, quote="",
                          comment.char=""))
dt$rev_ts = as.POSIXct(format(dt$rev_timestamp, scientific=F), format="%Y%m%d%H%M%S")
dt$rev_month = as.Date(paste(format(dt$rev_ts, "%Y-%m-"), "01", sep=""))
dt$rev_day = as.Date(dt$rev_ts)
dt$reverting_ts = as.POSIXct(format(dt$reverting_timestamp, scientific=F), format="%Y%m%d%H%M%S")
dt$reverting_month = as.Date(paste(format(dt$reverting_ts, "%Y-%m-"), "01", sep=""))
dt$reverting_day = as.Date(dt$reverting_ts)

Build the page_reverts dataframe, grouping by page id (rev_page) and finding for each page:

  • number of bot-bot reverts
  • number of bots involved
  • number bots that reverted other bots
  • number of bots reverted by other bots
  • first bot-bot revert
  • last bot-bot revert

In [2]:
dt.by_page = setkey(dt, rev_page, rev_user_text, reverting_user_text)
page_reverts = dt[
    page_namespace == 0,
    list(reverts=length(rev_id),
         bots_involved=length(unique(c(rev_user_text, reverting_user_text))), 
         reverting_bots=length(unique(reverting_user_text)), 
         reverted_bots=length(unique(rev_user_text)),
         first_revert=min(reverting_ts),
         last_revert=max(reverting_ts)),
    list(rev_page)]

In [3]:
page_reverts[order(page_reverts$reverts, decreasing=T),][1:10,]


rev_pagerevertsbots_involvedreverting_botsreverted_botsfirst_revertlast_revert
49143051 69 7 3 4 2016-04-14 19:05:022016-10-12 00:49:25
5971803 49 16 4 12 2006-10-21 19:17:512013-01-19 21:33:27
5971841 46 7 3 4 2009-08-08 11:30:252012-10-17 21:36:17
5487 41 4 2 2 2016-07-02 14:44:332016-07-07 19:02:07
5971837 39 8 3 5 2006-10-21 19:19:582012-09-16 16:13:25
4413025 35 4 2 2 2016-07-02 20:13:112016-07-07 20:32:53
5971821 34 8 3 5 2006-10-21 19:19:072012-10-17 21:35:21
5971843 33 8 3 5 2006-10-21 19:20:202012-10-17 21:36:24
24260 32 24 13 11 2010-09-23 06:08:232011-06-23 13:08:19
30736081 32 6 3 3 2016-04-23 10:39:042016-07-21 07:45:40

Visualizations

KDE of number of bot-bot reverts per page


In [4]:
ggplot(page_reverts, aes(x=reverts)) + 
geom_density(adjust=10) + 
scale_x_log10()


KDE histogram of number of bot-bot reverts per page, filtered to >= 5 reverts per page


In [5]:
ggplot(page_reverts[reverts >= 5,], aes(x=reverts)) + 
geom_density(adjust=3) + 
scale_x_log10()


KDE histogram of number of bot-bot reverts per page, filtered to >= 5 reverts per page


In [6]:
ggplot(page_reverts[reverts >= 10,], aes(x=reverts)) + 
geom_density(adjust=2) + 
scale_x_log10()


Looking at pages with the highest number of bot-bot reverts per page

Pages with >=10 bot-bot reverts per page


In [7]:
page_reverts[reverts >= 10,][1:10,]


rev_pagerevertsbots_involvedreverting_botsreverted_botsfirst_revertlast_revert
1688 15 21 10 11 2008-11-14 11:25:182010-11-14 04:03:47
3457 24 15 7 8 2010-03-08 02:28:082011-03-01 06:55:19
5487 41 4 2 2 2016-07-02 14:44:332016-07-07 19:02:07
18964 11 12 6 6 2010-09-26 13:39:212011-11-04 03:39:34
24260 32 24 13 11 2010-09-23 06:08:232011-06-23 13:08:19
26980 11 13 6 7 2007-03-08 08:48:022011-03-23 08:28:55
30366 10 15 8 9 2008-04-01 22:02:002008-09-18 08:39:41
31853 15 14 6 8 2009-02-08 10:34:422011-03-01 13:32:03
55983 23 8 4 4 2011-05-31 22:49:132011-06-03 07:50:33
68606 10 12 6 6 2010-11-15 02:57:502010-12-31 22:17:08

Find top bot-bot revert pairs per page


In [8]:
page_bot_pairs = dt[
    page_namespace == 0,
    list(reverts=length(unique(rev_id)),
         first_revert=min(reverting_ts),
         last_revert=max(reverting_ts)),
    list(rev_page, bots=paste(pmin(as.character(reverting_user_text), as.character(rev_user_text)), 
                              pmax(as.character(reverting_user_text), as.character(rev_user_text))))]

In [9]:
page_bot_pairs[order(page_bot_pairs$reverts, decreasing=T),][1:10,]


rev_pagebotsrevertsfirst_revertlast_revert
5487 AnomieBOT Cyberbot II41 2016-07-02 14:44:33 2016-07-07 19:02:07
5971841 FrescoBot Mathbot 41 2010-04-22 04:06:47 2012-10-17 21:36:17
49143051 ListeriaBot Yobot 39 2016-04-20 05:31:25 2016-08-14 08:25:09
4413025 AnomieBOT Cyberbot II35 2016-07-02 20:13:11 2016-07-07 20:32:53
1121632 AnomieBOT Cyberbot II31 2016-04-23 10:38:58 2016-07-07 08:36:16
8948190 AnomieBOT Cyberbot II31 2016-04-23 10:38:48 2016-07-09 10:15:02
22807757 AnomieBOT Cyberbot II31 2016-04-23 10:38:54 2016-07-09 10:15:03
5469430 AnomieBOT Cyberbot II30 2016-04-23 10:38:53 2016-07-09 10:15:07
22881933 AnomieBOT Cyberbot II30 2016-04-23 10:38:51 2016-07-09 10:15:06
30736081 AnomieBOT Cyberbot II30 2016-04-23 10:39:04 2016-07-21 07:25:36
  • Mathbot and FrescoBot are definitely fighting.
  • AnomieBOT and Cyberbot II are definitely fighting.
  • BG19bot and Yobot are fighting ListeriaBot.

This is a gold mine!

The longest single-page mutual bot-on-bot revert sequence lasted 41 reverts and it continued over the course of 2 and a half years. It happened on "List of Mathematicians (X)" between Mathbot and FrescoBot. Mathbot updates the lists of mathematicians based on categorizations in Wikipedia. FrescoBot fixes link syntax. When the target of the link and the label are the same, it simplifies the link. Like clockwork, FrescoBot writes out a link of the structure [[ |, ]]. Normally these bots work together beautifully, but in the case of mathematicians with one name -- in this case "Xenocrates", Mathbot writes the link as [[Xenocrates|Xenocrates]] and FrescoBot dutifully, simplifies the link to just [[Xenocrates]]. Every time that Mathbot runs, it changes the link back to [[Xenocrates|Xenocrates]] and FrescoBot changes it back.

Actually, it's a tie and honestly, this second case might be more interesting. AnomieBOT and CyberBot II also had an 82 revert sequence on a single page, but it lasted for 41 reverts over the course of only 4 days! On the article about "Foreign relations of the Central African Republic", AnomieBOT claimed to be "rescuing orphaned refs" -- adding a reference to dead link by using the internet archive to provide a copy of the old referenced PDF titled "International Criminal Court: Background – Situation in the Central African Republic". Every time that AnomieBOT "rescued" the link, Cyberbot II swing by and removed the reference with the confusing comment "Rescuing 1 sources". This case is arguably worse than FrescoBot and Mathbot because it spanned many pages. The bots had similar fights on the biography of the songwriter Rico Love (35 reverts), the broadcaster Dougie Vipond (31 reverts), and the song "Seasons Change" (31 reverts). The list keeps going. All told, these bots reverted each other 396 times on 15 pages -- constantly adding links to Internet Archive pages and then removing them again.

Visualize KDE distribution of AnomieBOT and Cyberbot II's reverts over time


In [10]:
ggplot(dt[reverting_user_text %in% c("AnomieBOT", "Cyberbot II") &
          rev_user_text %in% c("AnomieBOT", "Cyberbot II"),], 
       aes(x=reverting_ts)) + 
geom_density()

dt[reverting_user_text %in% c("AnomieBOT", "Cyberbot II") &
          rev_user_text %in% c("AnomieBOT", "Cyberbot II"),
    list(n=length(unique(rev_id)), pages=length(unique(rev_page)))]


npages
39615

Explore the shape of the page_bot_pairs dataframe


In [11]:
ggplot(page_bot_pairs, aes(x=reverts)) + 
geom_density()



In [12]:
ggplot(page_bot_pairs[reverts > 2,], aes(x=reverts)) + 
geom_density()


There's definitely some shape to this. There's something around 20 reverts where, beyond that, there's some density. Let's look at 3-10, 10-20, 20-30, and 30-40.


In [13]:
page_bot_pairs[reverts > 2 & reverts < 10,][1:10,]


rev_pagebotsrevertsfirst_revertlast_revert
803 Synthebot TXiKiBoT 3 2008-08-26 21:44:51 2008-08-29 00:55:38
1806 WikitanvirBot Xqbot 5 2011-02-27 16:47:51 2011-02-27 20:41:58
3386 TjBot WikitanvirBot 7 2011-05-31 10:22:14 2011-05-31 16:21:41
3457 TjBot WikitanvirBot 9 2011-02-20 13:11:26 2011-03-01 06:55:19
3457 WikitanvirBot Xqbot 8 2011-02-21 00:33:06 2011-02-28 23:51:13
5935 GrouchoBot Xqbot 3 2011-07-02 21:05:01 2011-07-03 15:29:14
11145 Numbo3-bot Xqbot 4 2009-10-03 18:17:19 2009-10-13 07:05:51
11260 AnomieBOT Cyberbot II6 2016-04-23 10:38:41 2016-06-09 08:01:56
18964 WikitanvirBot Xqbot 5 2011-02-18 18:12:16 2011-02-22 15:09:53
19673 Alexbot Obersachsebot3 2009-12-12 17:33:53 2009-12-13 14:17:47

This is mostly fighting, but some of the fights are less obvious given so few interactions.

  • TXiKiBoT and Synthebot aren't really fighting, but it's hard to figure why Synthebot was doing what it was doing.
  • EmausBot and KamikazeBot look like they might be fighting a little bit. They are fighting over the same links that HRoestBot, Rubinbot, and SieBot are on the page titled "Alphons".
  • Alexbot and RussBot are just managing redirects around naming of "Arab world", "Arab World", and "Arab Countries"
  • WikitanvirBot Xqbot are fighting over whether or not to link to hiwiki. Their fight lasts a few hours before it comes to an end. This was due to a known bug and the workaround Xqt proposed is https://www.mediawiki.org/wiki/Special:Code/pywikipedia/9018

In [14]:
page_bot_pairs[reverts >= 10 & reverts < 20,][1:10,]


rev_pagebotsrevertsfirst_revertlast_revert
24260 WikitanvirBot Xqbot 11 2011-03-03 07:37:13 2011-06-17 19:49:41
55983 LaaknorBot WikitanvirBot17 2011-05-31 22:49:13 2011-06-03 07:50:33
160108 ArthurBot Mjbmrbot 13 2010-11-08 09:20:36 2010-11-08 11:06:15
234906 28bot AnomieBOT 15 2012-12-19 08:58:37 2012-12-19 11:22:08
5971770 FrescoBot Mathbot 19 2010-06-05 21:26:08 2012-10-17 21:33:33
5971797 FrescoBot Mathbot 14 2010-07-05 21:26:18 2012-10-17 21:33:41
5971799 FrescoBot Mathbot 17 2010-06-05 21:26:26 2012-10-17 21:33:49
5971803 EmausBot Mathbot 10 2011-07-06 03:13:23 2011-07-28 21:35:04
5971806 FrescoBot Mathbot 13 2010-06-05 21:26:57 2012-10-17 21:34:20
5971809 FrescoBot Mathbot 15 2010-06-05 21:27:05 2012-09-16 16:11:51
  • WikitanvirBot and Xqbot are doing what they were doing in the last set.
  • We already know about Frescobot and Mathbot. The interaction with EmausBot is a fight too. It seems that Mathbot is just overwriting the work of other bots that it encounters.
  • 28bot and AnomieBOT are fighting. AnomieBOT adds a empty reference tag and 28bot reverts the edit because it sees the edit as a "test edit".

OK, I'm satisfied. Most single-page revert activity between bots that involves more than 2 edits is a fight. Let's look at how many reverts are accounted for.


In [15]:
length(unique(dt$rev_id))


512562

In [16]:
sum(page_bot_pairs[reverts >= 2,]$reverts)


16959

lol. So, maybe