Section 8: Coded comments analysis and possible botfights

This is a data analysis script used to produce findings in the paper, which you can run based entirely off the files in this GitHub repository.

This entire notebook can be run from the beginning with Kernel -> Restart & Run All in the menu bar. It takes about 2 minutes to run on a laptop running a Core i5-2540M processor.



In [1]:

    
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import glob
import pickle
import datetime
%matplotlib inline



In [2]:

    
start = datetime.datetime.now()



In [3]:

    
!unxz --keep --force ../../datasets/parsed_dataframes/df_all_comments_parsed_2016.pickle.xz



In [4]:

    
with open("../../datasets/parsed_dataframes/df_all_comments_parsed_2016.pickle", "rb") as f:
    df_all = pickle.load(f)



In [5]:

    
len(df_all)









    Out[5]:





924945



In [6]:

    
df_all[0:2].transpose()









    Out[6]:







  
    
      
      0
      1
    
  
  
    
      archived
      False
      False
    
    
      language
      ja
      ja
    
    
      page_namespace
      14
      0
    
    
      rev_deleted
      False
      False
    
    
      rev_id
      31654187
      36768330
    
    
      rev_minor_edit
      True
      True
    
    
      rev_page
      649447
      10
    
    
      rev_parent_id
      3.1402e+07
      3.67548e+07
    
    
      rev_revert_offset
      1
      1
    
    
      rev_sha1
      hi8o3p1yka5v6fb8hdux0771qjxfewp
      nvmvm6tmzlze06abk9tvb4xiy1tp2ph
    
    
      rev_timestamp
      20100418121115
      20110316095543
    
    
      rev_user
      105371
      105371
    
    
      rev_user_text
      VolkovBot
      VolkovBot
    
    
      reverted_to_rev_id
      31401978
      36754779
    
    
      reverting_archived
      False
      False
    
    
      reverting_comment
      ボット: 言語間リンク 1 件を[[Wikipedia:ウィキデータ|ウィキデータ]]上の ...
      r2.6.4) (ロボットによる 追加: [[ksh:Sprooch]]
    
    
      reverting_deleted
      False
      False
    
    
      reverting_id
      47290084
      36779316
    
    
      reverting_minor_edit
      True
      True
    
    
      reverting_page
      649447
      10
    
    
      reverting_parent_id
      3.16542e+07
      3.67683e+07
    
    
      reverting_sha1
      9i554cxlbugjcgq0ker7p6qsiscktbt
      j6mpty2xkvsejika47aq7m1rj9on882
    
    
      reverting_timestamp
      20130410053616
      20110317101016
    
    
      reverting_user
      397108
      397108
    
    
      reverting_user_text
      EmausBot
      EmausBot
    
    
      revisions_reverted
      1
      1
    
    
      namespace_type
      category
      article
    
    
      reverted_timestamp_dt
      2010-04-18 12:11:15
      2011-03-16 09:55:43
    
    
      reverting_timestamp_dt
      2013-04-10 05:36:16
      2011-03-17 10:10:16
    
    
      time_to_revert
      1087 days 17:25:01
      1 days 00:14:33
    
    
      time_to_revert_hrs
      26105.4
      24.2425
    
    
      time_to_revert_days
      1087.73
      1.0101
    
    
      reverting_year
      2013
      2011
    
    
      time_to_revert_days_log10
      3.03652
      0.00436616
    
    
      time_to_revert_hrs_log10
      4.41673
      1.38458
    
    
      reverting_comment_nobracket
      ボット: 言語間リンク 1 件を上の に転記
      r2.6.4)
    
    
      botpair
      EmausBot rv VolkovBot
      EmausBot rv VolkovBot
    
    
      botpair_sorted
      ['EmausBot', 'VolkovBot']
      ['EmausBot', 'VolkovBot']
    
    
      reverts_per_page_botpair
      1
      1
    
    
      reverts_per_page_botpair_sorted
      1
      1
    
    
      bottype
      interwiki link cleanup -- method1
      interwiki link cleanup -- method2
    
    
      bottype_group
      interwiki link cleanup -- method1
      interwiki link cleanup -- method2

Filter to articles only



In [7]:

    
df_all_ns0 = df_all.query("page_namespace == 0")

Calculate counts and proportions of reverts by bottype and bottype_group for each language



In [8]:

    
counts_bottype_dict = {}
for lang in df_all_ns0['language'].unique():

    df_lang_ns0 = df_all_ns0[df_all_ns0['language']==lang]
    
    type_counts = df_lang_ns0['bottype'].value_counts().rename("count")
    type_percent = df_lang_ns0['bottype'].value_counts(normalize=True).rename("percent") * 100
    type_percent = type_percent.round(2).astype(str) + "%"

    counts_bottype_dict[lang]=pd.concat([type_counts, type_percent], axis=1)
    
counts_bottype_group_dict = {}

for lang in df_all_ns0['language'].unique():

    df_lang_ns0 = df_all_ns0[df_all_ns0['language']==lang]
    
    type_counts = df_lang_ns0['bottype_group'].value_counts().rename("count")
    type_percent = df_lang_ns0['bottype_group'].value_counts(normalize=True).rename("percent") * 100
    type_percent = type_percent.round(2).astype(str) + "%"

    counts_bottype_group_dict[lang]=pd.concat([type_counts, type_percent], axis=1)



In [9]:

    
counts_bottype_group_dict['en']









    Out[9]:







  
    
      
      count
      percent
    
  
  
    
      fixing double redirect
      110513
      45.15%
    
    
      interwiki link cleanup -- method1
      83718
      34.2%
    
    
      interwiki link cleanup -- method2
      37085
      15.15%
    
    
      botfight
      3408
      1.39%
    
    
      protection template cleanup
      2831
      1.16%
    
    
      not classified
      2616
      1.07%
    
    
      category work
      1779
      0.73%
    
    
      template work
      1273
      0.52%
    
    
      other w/ revert in comment
      1009
      0.41%
    
    
      other classified
      561
      0.23%



In [10]:

    
prop_bottype_group_df = pd.DataFrame()



In [11]:

    
pd.set_option('precision',4)



In [12]:

    
for df in counts_bottype_group_dict.items():
    concat_df = df[1]['percent']
    concat_df.name = df[0] + " %"
    prop_bottype_group_df = pd.concat([prop_bottype_group_df, concat_df], axis=1)



In [13]:

    
prop_bottype_group_df.fillna("---")









    Out[13]:







  
    
      
      pt %
      en %
      ja %
      zh %
      fr %
      es %
      de %
    
  
  
    
      botfight
      0.0%
      1.39%
      ---
      0.01%
      0.0%
      0.02%
      0.03%
    
    
      category work
      ---
      0.73%
      ---
      ---
      0.48%
      0.16%
      0.01%
    
    
      fixing double redirect
      3.14%
      45.15%
      0.85%
      8.58%
      5.78%
      14.04%
      1.8%
    
    
      interwiki link cleanup -- method1
      22.66%
      34.2%
      14.5%
      34.07%
      17.95%
      23.56%
      30.17%
    
    
      interwiki link cleanup -- method2
      69.15%
      15.15%
      79.46%
      54.89%
      73.25%
      55.51%
      65.34%
    
    
      not classified
      4.94%
      1.07%
      5.14%
      1.85%
      1.82%
      4.69%
      2.56%
    
    
      other classified
      0.06%
      0.23%
      0.02%
      0.01%
      0.71%
      0.21%
      0.02%
    
    
      other w/ revert in comment
      0.04%
      0.41%
      0.03%
      0.6%
      0.01%
      1.62%
      0.05%
    
    
      protection template cleanup
      ---
      1.16%
      ---
      ---
      ---
      ---
      0.02%
    
    
      template work
      ---
      0.52%
      ---
      ---
      ---
      0.2%
      0.01%



In [14]:

    
pd.concat([df[1]['percent'], df[1]['percent']], axis=1)









    Out[14]:







  
    
      
      de %
      de %
    
  
  
    
      interwiki link cleanup -- method2
      65.34%
      65.34%
    
    
      interwiki link cleanup -- method1
      30.17%
      30.17%
    
    
      not classified
      2.56%
      2.56%
    
    
      fixing double redirect
      1.8%
      1.8%
    
    
      other w/ revert in comment
      0.05%
      0.05%
    
    
      botfight
      0.03%
      0.03%
    
    
      protection template cleanup
      0.02%
      0.02%
    
    
      other classified
      0.02%
      0.02%
    
    
      category work
      0.01%
      0.01%
    
    
      template work
      0.01%
      0.01%



In [ ]:



In [15]:

    
gb_lang_bottype = df_all_ns0.groupby(["language", "bottype"])['revisions_reverted']
gb_lang_bottype_group = df_all_ns0.groupby(["language", "bottype_group"])['revisions_reverted']



In [16]:

    
gb_lang_bottype.count().unstack().transpose().replace(np.nan,0)









    Out[16]:







  
    
      language
      de
      en
      es
      fr
      ja
      pt
      zh
    
    
      bottype
      
      
      
      
      
      
      
    
  
  
    
      botfight: 718bot vs ImageRemovalBot
      1.0
      170.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      botfight: Cyberbot II vs AnomieBOT date tagging
      0.0
      301.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      botfight: Russbot vs Cydebot category renaming
      10.0
      2095.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      botfight: infoboxneeded
      0.0
      98.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      botfight: mathbot mathlist updates
      0.0
      514.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      botfight: reverting CommonsDelinker
      5.0
      230.0
      16.0
      2.0
      0.0
      1.0
      3.0
    
    
      category redirect cleanup
      1.0
      337.0
      106.0
      0.0
      0.0
      0.0
      0.0
    
    
      clearing sandbox
      0.0
      1.0
      1.0
      2.0
      0.0
      1.0
      0.0
    
    
      commons image migration
      0.0
      3.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      fixing double redirect
      989.0
      110513.0
      9561.0
      3247.0
      294.0
      1888.0
      3690.0
    
    
      fixing links
      0.0
      81.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      interwiki link cleanup -- method1
      16573.0
      83718.0
      16045.0
      10082.0
      5042.0
      13626.0
      14661.0
    
    
      interwiki link cleanup -- method2
      35894.0
      37085.0
      37801.0
      41154.0
      27631.0
      41576.0
      23618.0
    
    
      moving category
      7.0
      1442.0
      0.0
      268.0
      0.0
      0.0
      0.0
    
    
      other
      1405.0
      2616.0
      3195.0
      1025.0
      1787.0
      2970.0
      794.0
    
    
      other w/ per justification
      10.0
      179.0
      140.0
      398.0
      7.0
      36.0
      6.0
    
    
      other w/ revert in comment
      28.0
      1009.0
      1101.0
      3.0
      11.0
      26.0
      257.0
    
    
      protection template cleanup
      10.0
      2831.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      redirect tagging/sorting
      0.0
      297.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      template cleanup
      5.0
      1249.0
      133.0
      0.0
      0.0
      0.0
      0.0
    
    
      template tagging
      0.0
      24.0
      0.0
      0.0
      0.0
      0.0
      0.0



In [17]:

    
gb_lang_bottype_group.count().unstack().transpose().replace(np.nan,0).sort_values(by='en', ascending=False)









    Out[17]:







  
    
      language
      de
      en
      es
      fr
      ja
      pt
      zh
    
    
      bottype_group
      
      
      
      
      
      
      
    
  
  
    
      fixing double redirect
      989.0
      110513.0
      9561.0
      3247.0
      294.0
      1888.0
      3690.0
    
    
      interwiki link cleanup -- method1
      16573.0
      83718.0
      16045.0
      10082.0
      5042.0
      13626.0
      14661.0
    
    
      interwiki link cleanup -- method2
      35894.0
      37085.0
      37801.0
      41154.0
      27631.0
      41576.0
      23618.0
    
    
      botfight
      16.0
      3408.0
      16.0
      2.0
      0.0
      1.0
      3.0
    
    
      protection template cleanup
      10.0
      2831.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      not classified
      1405.0
      2616.0
      3195.0
      1025.0
      1787.0
      2970.0
      794.0
    
    
      category work
      8.0
      1779.0
      106.0
      268.0
      0.0
      0.0
      0.0
    
    
      template work
      5.0
      1273.0
      133.0
      0.0
      0.0
      0.0
      0.0
    
    
      other w/ revert in comment
      28.0
      1009.0
      1101.0
      3.0
      11.0
      26.0
      257.0
    
    
      other classified
      10.0
      561.0
      141.0
      400.0
      7.0
      37.0
      6.0



In [18]:

    
sns.set(font_scale=1.5)
sns.set_style("whitegrid")
sns.set_palette("husl")
gb_lang_bottype_group.sum().unstack().transpose().plot(kind='bar', subplots=False, figsize=[12,6])









    Out[18]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f0c745c7fd0>



In [19]:

    
sns.set(font_scale=1.5)
sns.set_style("whitegrid")
sns.set_palette("husl")
gb_lang_bottype_group.sum().unstack().transpose().plot(kind='barh', subplots=False, figsize=[12,6])

plt.xscale("log")



In [ ]:



In [ ]:

Classified time to revert

English, articles only



In [20]:

    
sns.set(font_scale=2)
pal = sns.color_palette("hls", 10)
g = sns.FacetGrid(df_all.query("page_namespace == 0 and language == 'en'"),
                  palette=pal, hue="bottype_group", size=8, aspect=2)
g.map(sns.kdeplot, "time_to_revert_hrs_log10")
#g.add_legend()
leg = plt.legend()
for legobj in leg.legendHandles:
    legobj.set_linewidth(8.0)

g.ax.set_xlim([np.log10(1/90), np.log10(24*365*5)])
g.ax.set_ylim(0,1.25)
g.ax.set_xticks([np.log10(1/60),np.log10(1), np.log10(24), np.log10(24*7), np.log10(24*30), np.log10(24*365)])
g.ax.set_xticklabels(["minute", "hour", "day", "week", "month", "year"])









    Out[20]:





[<matplotlib.text.Text at 0x7f0c73e36f60>,
 <matplotlib.text.Text at 0x7f0c73e3b748>,
 <matplotlib.text.Text at 0x7f0c73de00f0>,
 <matplotlib.text.Text at 0x7f0c73de0d30>,
 <matplotlib.text.Text at 0x7f0c73de5630>,
 <matplotlib.text.Text at 0x7f0c73de82b0>]

All languages, articles only, KDE, combined plot



In [21]:

    
sns.set(font_scale=2)
pal = sns.color_palette("hls", 10)
g = sns.FacetGrid(df_all.query("page_namespace == 0"),
                  palette=pal, hue="bottype_group", size=8, aspect=2)
g.map(sns.kdeplot, "time_to_revert_hrs_log10")
#g.add_legend()
leg = plt.legend()
for legobj in leg.legendHandles:
    legobj.set_linewidth(8.0)

g.ax.set_xlim([np.log10(1/90), np.log10(24*365*5)])
g.ax.set_ylim(0,1.25)
g.ax.set_xticks([np.log10(1/60),np.log10(1), np.log10(24), np.log10(24*7), np.log10(24*30), np.log10(24*365)])
g.ax.set_xticklabels(["minute", "hour", "day", "week", "month", "year"])









    Out[21]:





[<matplotlib.text.Text at 0x7f0c7426f128>,
 <matplotlib.text.Text at 0x7f0c742114e0>,
 <matplotlib.text.Text at 0x7f0c74383278>,
 <matplotlib.text.Text at 0x7f0c74164cc0>,
 <matplotlib.text.Text at 0x7f0c74391518>,
 <matplotlib.text.Text at 0x7f0c7416a080>]

All languages, articles only, panel plot, KDE only



In [22]:

    
sns.set(font_scale=2)
pal = sns.color_palette("husl", 7)
g = sns.FacetGrid(df_all.query("page_namespace == 0"),
                  palette=pal, row="bottype_group", size=3, aspect=4, sharex=False, sharey=False)
g.map(sns.kdeplot, "time_to_revert_hrs_log10")

xticks = [np.log10(1/60),np.log10(1), np.log10(24), np.log10(24*7), np.log10(24*30), np.log10(24*365)]
xticklabels = ["minute", "hour", "day", "week", "month", "year"]

for ax in g.axes.flatten():
    ax.set_xticks(xticks)
    ax.set_xticklabels(xticklabels)
    ax.set_xlim(np.log10(1/90), np.log10(24*365*5))

All languages, articles only, paneled plot, KDE and histogram



In [23]:

    
sns.set(font_scale=2.25)
pal = sns.color_palette("husl", 7)
g = sns.FacetGrid(df_all.query("page_namespace == 0"),
                  palette=pal, col="bottype_group", size=3, aspect=4,
                  col_wrap = 2, sharex=False, sharey=True)
g.map(sns.kdeplot, "time_to_revert_hrs_log10", shade=True)

xticks = [np.log10(1/60),np.log10(1), np.log10(24), np.log10(24*7), np.log10(24*30), np.log10(24*365)]
xticklabels = ["minute", "hour", "day", "week", "month", "year"]

for ax in g.axes.flatten():
    ax.set_xticks(xticks)
    ax.set_xticklabels(xticklabels)
    ax.set_ylim(0,1.25)
    ax.set_xlim(np.log10(1/90), np.log10(24*365*5))
    if ax.colNum == 0:
        ax.set_ylabel("Probability")
    if ax.rowNum == 4:
        ax.set_xlabel("Time to revert")
plt.savefig("ttr-categorized.pdf", dpi=600)

Mean number of reverts per language per bottype



In [24]:

    
gb_group_per_page = df_all.query("page_namespace == 0").groupby(["language","bottype_group"])['reverts_per_page_botpair_sorted']



In [25]:

    
gb_group_per_page.mean().unstack()









    Out[25]:







  
    
      bottype_group
      botfight
      category work
      fixing double redirect
      interwiki link cleanup -- method1
      interwiki link cleanup -- method2
      not classified
      other classified
      other w/ revert in comment
      protection template cleanup
      template work
    
    
      language
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      de
      1.7500
      2.7500
      1.1598
      1.0003
      1.1709
      1.1246
      1.2000
      1.0357
      1.0000
      1.0000
    
    
      en
      6.2444
      1.2012
      1.1234
      1.0000
      1.1064
      2.6827
      4.0909
      2.8057
      1.1166
      1.0259
    
    
      es
      1.0000
      1.0000
      1.1434
      1.0009
      1.1155
      1.0917
      1.0000
      1.0154
      NaN
      1.0000
    
    
      fr
      1.0000
      1.0000
      1.0339
      1.0015
      1.1194
      1.0946
      1.0025
      1.6667
      NaN
      NaN
    
    
      ja
      NaN
      NaN
      1.0714
      1.0030
      1.1926
      1.0923
      1.0000
      1.3636
      NaN
      NaN
    
    
      pt
      1.0000
      NaN
      1.0694
      1.0007
      1.1351
      1.0609
      1.0000
      1.1923
      NaN
      NaN
    
    
      zh
      1.0000
      NaN
      1.1572
      1.0001
      1.2060
      1.1486
      1.3333
      1.5136
      NaN
      NaN

Possible botfights

Number of reverts that are possible botfights, per language per bottype

This only includes reverts with reverts_per_page_botpair_sorted > 1 (meaning they were not reciprocated by the reverted bot) and where the time to revert was less than 180 days.



In [26]:

    
df_all_ns0_multiple_reverts = df_all_ns0.query("reverts_per_page_botpair_sorted > 1 and time_to_revert_days < 180")
gb_lang_bottype_group_rr = df_all_ns0_multiple_reverts.groupby(["language", "bottype_group"])['revisions_reverted']



In [27]:

    
gb_lang_bottype_group_rr.count().unstack().transpose().replace(np.nan,0).sort_values(by='en', ascending=False)









    Out[27]:







  
    
      language
      de
      en
      es
      fr
      ja
      pt
      zh
    
    
      bottype_group
      
      
      
      
      
      
      
    
  
  
    
      fixing double redirect
      108.0
      8925.0
      690.0
      86.0
      20.0
      113.0
      446.0
    
    
      interwiki link cleanup -- method2
      2906.0
      2171.0
      2284.0
      2567.0
      2059.0
      2318.0
      2169.0
    
    
      botfight
      10.0
      1722.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      not classified
      86.0
      355.0
      155.0
      59.0
      145.0
      95.0
      67.0
    
    
      category work
      6.0
      350.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      protection template cleanup
      0.0
      258.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      other w/ revert in comment
      1.0
      168.0
      12.0
      2.0
      4.0
      5.0
      121.0
    
    
      other classified
      1.0
      97.0
      0.0
      1.0
      0.0
      0.0
      2.0
    
    
      template work
      0.0
      33.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      interwiki link cleanup -- method1
      2.0
      4.0
      4.0
      6.0
      4.0
      3.0
      2.0

Sum by language



In [28]:

    
gb_lang_bottype_group_rr.count().unstack().transpose().replace(np.nan,0).sort_values(by='en', ascending=False).sum()









    Out[28]:





language
de     3120.0
en    14083.0
es     3145.0
fr     2721.0
ja     2232.0
pt     2534.0
zh     2807.0
dtype: float64



In [29]:

    
gb_lang_bottype_group_rr.count().unstack().transpose().replace(np.nan,0).sort_values(by='en', ascending=False).sum()









    Out[29]:





language
de     3120.0
en    14083.0
es     3145.0
fr     2721.0
ja     2232.0
pt     2534.0
zh     2807.0
dtype: float64

Total number of reverts that are part of possible botfights



In [30]:

    
gb_lang_bottype_group_rr.count().unstack().transpose().replace(np.nan,0).sort_values(by='en', ascending=False).sum().sum()









    Out[30]:





30642.0

Possible botfight plots



In [31]:

    
sns.set(font_scale=1.5)
gb_lang_bottype_group_rr.count().unstack().transpose().replace(np.nan,0).sort_values(by='en', ascending=False).plot(kind='bar')









    Out[31]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f0c6f265b70>



In [ ]:

Final proportions of bot-bot reverts as conflict vs. non-conflict

Most conservative estimate: proportion of bot-bot reverts to articles that are likely not conflict

This is made by counting the number of reverts that were not classified, classified as identified "botfight", or have "revert" in the edit summary.



In [32]:

    
to_keep = ["botfight", "other w/ revert in comment", "not classified"]



In [33]:

    
len(df_all_ns0[df_all_ns0['bottype_group'].isin(to_keep)])









    Out[33]:





19673

So this is 19,673 bot-bot reverts to articles that we cannot assume are not conflict. What is the proportion? Divide this figure by the total number of bot-bot reverts to articles, then subtract from 1.



In [34]:

    
1 - len(df_all_ns0[df_all_ns0['bottype_group'].isin(to_keep)])/len(df_all_ns0)









    Out[34]:





0.9649906750946727



In [35]:

    
len(df_all_ns0[df_all_ns0['bottype_group'].isin(to_keep)])/len(df_all_ns0)









    Out[35]:





0.035009324905327294

Estimate based on additional assumptions

What is the proportion if we filter to bot-bot reverts where the time to revert was less than 180 days and where there was more than one interaction per pair of bots on a particular article, then filter out cases classified as anything other than "botfight", "other w/ revert in comment", and "not classified"?



In [36]:

    
df_all_ns0_addl_assmpt = df_all_ns0.query("reverts_per_page_botpair_sorted > 1 and time_to_revert_days < 180")
len(df_all_ns0_addl_assmpt[df_all_ns0_addl_assmpt['bottype_group'].isin(to_keep)])









    Out[36]:





3007

So 3,007 additional possible cases of bot-bot conflict, which makes for what proportion of all bot-bot reverts to articles?



In [37]:

    
1 - len(df_all_ns0_addl_assmpt[df_all_ns0_addl_assmpt['bottype_group'].isin(to_keep)])/len(df_all_ns0)









    Out[37]:





0.9946488568093164



In [38]:

    
len(df_all_ns0_addl_assmpt[df_all_ns0_addl_assmpt['bottype_group'].isin(to_keep)])/len(df_all_ns0)









    Out[38]:





0.0053511431906836365

Export possible botfights



In [39]:

    
df_all_ns0_multiple_reverts.to_pickle("../../datasets/parsed_dataframes/possible_botfights.pickle")
df_all_ns0_multiple_reverts.to_csv("../../datasets/parsed_dataframes/possible_botfights.tsv", sep="\t")



In [40]:

    
!xz -9 -e --keep ../../datasets/parsed_dataframes/possible_botfights.pickle
!xz -9 -e --keep ../../datasets/parsed_dataframes/possible_botfights.tsv









    



xz: ../../datasets/parsed_dataframes/possible_botfights.pickle.xz: File exists
xz: ../../datasets/parsed_dataframes/possible_botfights.tsv.xz: File exists



In [ ]:



In [41]:

    
end = datetime.datetime.now()

time_to_run = end - start
minutes = int(time_to_run.seconds/60)
seconds = time_to_run.seconds % 60
print("Total runtime: ", minutes, "minutes, ", seconds, "seconds")









    



Total runtime:  1 minutes,  25 seconds



In [ ]:



In [ ]:



In [ ]:

	0	1
archived	False	False
language	ja	ja
page_namespace	14	0
rev_deleted	False	False
rev_id	31654187	36768330
rev_minor_edit	True	True
rev_page	649447	10
rev_parent_id	3.1402e+07	3.67548e+07
rev_revert_offset	1	1
rev_sha1	hi8o3p1yka5v6fb8hdux0771qjxfewp	nvmvm6tmzlze06abk9tvb4xiy1tp2ph
rev_timestamp	20100418121115	20110316095543
rev_user	105371	105371
rev_user_text	VolkovBot	VolkovBot
reverted_to_rev_id	31401978	36754779
reverting_archived	False	False
reverting_comment	ボット: 言語間リンク 1 件を[[Wikipedia:ウィキデータ\|ウィキデータ]]上の ...	r2.6.4) (ロボットによる追加: [[ksh:Sprooch]]
reverting_deleted	False	False
reverting_id	47290084	36779316
reverting_minor_edit	True	True
reverting_page	649447	10
reverting_parent_id	3.16542e+07	3.67683e+07
reverting_sha1	9i554cxlbugjcgq0ker7p6qsiscktbt	j6mpty2xkvsejika47aq7m1rj9on882
reverting_timestamp	20130410053616	20110317101016
reverting_user	397108	397108
reverting_user_text	EmausBot	EmausBot
revisions_reverted	1	1
namespace_type	category	article
reverted_timestamp_dt	2010-04-18 12:11:15	2011-03-16 09:55:43
reverting_timestamp_dt	2013-04-10 05:36:16	2011-03-17 10:10:16
time_to_revert	1087 days 17:25:01	1 days 00:14:33
time_to_revert_hrs	26105.4	24.2425
time_to_revert_days	1087.73	1.0101
reverting_year	2013	2011
time_to_revert_days_log10	3.03652	0.00436616
time_to_revert_hrs_log10	4.41673	1.38458
reverting_comment_nobracket	ボット: 言語間リンク 1 件を上のに転記	r2.6.4)
botpair	EmausBot rv VolkovBot	EmausBot rv VolkovBot
botpair_sorted	['EmausBot', 'VolkovBot']	['EmausBot', 'VolkovBot']
reverts_per_page_botpair	1	1
reverts_per_page_botpair_sorted	1	1
bottype	interwiki link cleanup -- method1	interwiki link cleanup -- method2
bottype_group	interwiki link cleanup -- method1	interwiki link cleanup -- method2

	count	percent
fixing double redirect	110513	45.15%
interwiki link cleanup -- method1	83718	34.2%
interwiki link cleanup -- method2	37085	15.15%
botfight	3408	1.39%
protection template cleanup	2831	1.16%
not classified	2616	1.07%
category work	1779	0.73%
template work	1273	0.52%
other w/ revert in comment	1009	0.41%
other classified	561	0.23%

	pt %	en %	ja %	zh %	fr %	es %	de %
botfight	0.0%	1.39%	---	0.01%	0.0%	0.02%	0.03%
category work	---	0.73%	---	---	0.48%	0.16%	0.01%
fixing double redirect	3.14%	45.15%	0.85%	8.58%	5.78%	14.04%	1.8%
interwiki link cleanup -- method1	22.66%	34.2%	14.5%	34.07%	17.95%	23.56%	30.17%
interwiki link cleanup -- method2	69.15%	15.15%	79.46%	54.89%	73.25%	55.51%	65.34%
not classified	4.94%	1.07%	5.14%	1.85%	1.82%	4.69%	2.56%
other classified	0.06%	0.23%	0.02%	0.01%	0.71%	0.21%	0.02%
other w/ revert in comment	0.04%	0.41%	0.03%	0.6%	0.01%	1.62%	0.05%
protection template cleanup	---	1.16%	---	---	---	---	0.02%
template work	---	0.52%	---	---	---	0.2%	0.01%

language	de	en	es	fr	ja	pt	zh
bottype
botfight: 718bot vs ImageRemovalBot	1.0	170.0	0.0	0.0	0.0	0.0	0.0
botfight: Cyberbot II vs AnomieBOT date tagging	0.0	301.0	0.0	0.0	0.0	0.0	0.0
botfight: Russbot vs Cydebot category renaming	10.0	2095.0	0.0	0.0	0.0	0.0	0.0
botfight: infoboxneeded	0.0	98.0	0.0	0.0	0.0	0.0	0.0
botfight: mathbot mathlist updates	0.0	514.0	0.0	0.0	0.0	0.0	0.0
botfight: reverting CommonsDelinker	5.0	230.0	16.0	2.0	0.0	1.0	3.0
category redirect cleanup	1.0	337.0	106.0	0.0	0.0	0.0	0.0
clearing sandbox	0.0	1.0	1.0	2.0	0.0	1.0	0.0
commons image migration	0.0	3.0	0.0	0.0	0.0	0.0	0.0
fixing double redirect	989.0	110513.0	9561.0	3247.0	294.0	1888.0	3690.0
fixing links	0.0	81.0	0.0	0.0	0.0	0.0	0.0
interwiki link cleanup -- method1	16573.0	83718.0	16045.0	10082.0	5042.0	13626.0	14661.0
interwiki link cleanup -- method2	35894.0	37085.0	37801.0	41154.0	27631.0	41576.0	23618.0
moving category	7.0	1442.0	0.0	268.0	0.0	0.0	0.0
other	1405.0	2616.0	3195.0	1025.0	1787.0	2970.0	794.0
other w/ per justification	10.0	179.0	140.0	398.0	7.0	36.0	6.0
other w/ revert in comment	28.0	1009.0	1101.0	3.0	11.0	26.0	257.0
protection template cleanup	10.0	2831.0	0.0	0.0	0.0	0.0	0.0
redirect tagging/sorting	0.0	297.0	0.0	0.0	0.0	0.0	0.0
template cleanup	5.0	1249.0	133.0	0.0	0.0	0.0	0.0
template tagging	0.0	24.0	0.0	0.0	0.0	0.0	0.0

bottype_group	botfight	category work	fixing double redirect	interwiki link cleanup -- method1	interwiki link cleanup -- method2	not classified	other classified	other w/ revert in comment	protection template cleanup	template work
language
de	1.7500	2.7500	1.1598	1.0003	1.1709	1.1246	1.2000	1.0357	1.0000	1.0000
en	6.2444	1.2012	1.1234	1.0000	1.1064	2.6827	4.0909	2.8057	1.1166	1.0259
es	1.0000	1.0000	1.1434	1.0009	1.1155	1.0917	1.0000	1.0154	NaN	1.0000
fr	1.0000	1.0000	1.0339	1.0015	1.1194	1.0946	1.0025	1.6667	NaN	NaN
ja	NaN	NaN	1.0714	1.0030	1.1926	1.0923	1.0000	1.3636	NaN	NaN
pt	1.0000	NaN	1.0694	1.0007	1.1351	1.0609	1.0000	1.1923	NaN	NaN
zh	1.0000	NaN	1.1572	1.0001	1.2060	1.1486	1.3333	1.5136	NaN	NaN

language	de	en	es	fr	ja	pt	zh
bottype_group
fixing double redirect	108.0	8925.0	690.0	86.0	20.0	113.0	446.0
interwiki link cleanup -- method2	2906.0	2171.0	2284.0	2567.0	2059.0	2318.0	2169.0
botfight	10.0	1722.0	0.0	0.0	0.0	0.0	0.0
not classified	86.0	355.0	155.0	59.0	145.0	95.0	67.0
category work	6.0	350.0	0.0	0.0	0.0	0.0	0.0
protection template cleanup	0.0	258.0	0.0	0.0	0.0	0.0	0.0
other w/ revert in comment	1.0	168.0	12.0	2.0	4.0	5.0	121.0
other classified	1.0	97.0	0.0	1.0	0.0	0.0	2.0
template work	0.0	33.0	0.0	0.0	0.0	0.0	0.0
interwiki link cleanup -- method1	2.0	4.0	4.0	6.0	4.0	3.0	2.0