Fuzzing a test set for model bias analysis

This notebook creates a test set "fuzzed" over a set of identity terms. This fuzzed test set can be used for analyzing bias in a model.

The idea is that, for the most part, the specific identity term used should not be the key feature determining whether a comment is toxic or non-toxic. For example, the sentence "I had a friend growing up" should be considered non-toxic, and "All people must be wiped off the earth" should be considered toxic for all values of x in our terms set.

Given a set of terms, this code finds comments that mention those terms and replaces each instance with a random other term in the set. This fuzzed test set can be used to evaluate a model for bias. If the model performs worse on the fuzzed test set than on the non-fuzzed test set,

Data prep



In [1]:

    
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import pandas as pd
import urllib
import matplotlib.pyplot as plt
%matplotlib inline



In [4]:

    
COMMENTS = '../data/toxicity_annotated_comments.tsv'
ANNOTATIONS = '../data/toxicity_annotations.tsv'



In [122]:

    
comments = pd.read_csv(COMMENTS, sep='\t')
annotations = pd.read_csv(ANNOTATIONS, sep='\t')

# convert rev_id from float to int
comments['rev_id'] = comments['rev_id'].astype(int)
annotations['rev_id'] = annotations['rev_id'].astype(int)

# remove newline and tab tokens
comments['comment'] = comments['comment'].apply(lambda x: x.replace("NEWLINE_TOKEN", " "))
comments['comment'] = comments['comment'].apply(lambda x: x.replace("TAB_TOKEN", " "))



In [124]:

    
comments.head()









    Out[124]:






  
    
      
      rev_id
      comment
      year
      logged_in
      ns
      sample
      split
    
  
  
    
      0
      2232
      This: :One can make an analogy in mathematical...
      2002
      True
      article
      random
      train
    
    
      1
      4216
      `  :Clarification for you  (and Zundark's righ...
      2002
      True
      user
      random
      train
    
    
      2
      8953
      Elected or Electoral? JHK
      2002
      False
      article
      random
      test
    
    
      3
      26547
      `This is such a fun entry.   Devotchka  I once...
      2002
      True
      article
      random
      train
    
    
      4
      28959
      Please relate the ozone hole to increases in c...
      2002
      True
      article
      random
      test



In [126]:

    
# label a comment as toxic if the majority of annotators did so
comments.set_index('rev_id', inplace=True)
comments['toxic'] = annotations.groupby('rev_id')['toxicity'].mean() > 0.5

Identity term fuzzing



In [154]:

    
import re

def word_bound(pat):
    """Adds word boundary matchers to pattern."""
    return r'\b{}\b'.format(pat)

IDENTITY_TERMS = ['christian', 'catholic', 'protestant', 'muslim', 'sikh', 'jewish', 'jew',
                  'lesbian', 'gay', 'transgender', 'queer', 'homosexual', 'heterosexual']



In [130]:

    
%%time
test_comments = comments[comments.split == 'test']

IDENTITY_TERMS_BOUNDED = [word_bound(term) for term in IDENTITY_TERMS]
identity_comments = test_comments[test_comments.comment.str.contains('|'.join(IDENTITY_TERMS_BOUNDED), case=False)]









    



CPU times: user 968 ms, sys: 12 ms, total: 980 ms
Wall time: 953 ms



In [143]:

    
identity_comments[identity_comments.comment.str.len() < 30].comment









    Out[143]:





rev_id
56801367           MR laws is a homosexual
260376090             into a Jewish family
297126171         I too am a gay pedophile
337285529      ]] and [[Lesbian Separatist
340682233    call me out for being a queer
391186610             for my Jewish tastes
393367921          SpikeToronto Is Gay RCP
420715830    Jew Marxist Raus Raus Raus.  
539053641      == lgbt rights ==  your gay
Name: comment, dtype: object



In [165]:

    
import random

def fuzz_comment(text, identity_terms):
    terms_present = [term for term in identity_terms
                     if re.search(word_bound(term), text, flags=re.IGNORECASE)]    
    # TODO(jetpack): earlier replacements may be "overwritten" by later replacements.
    # not sure if there's a non-random effect from iterating this list.
    # since each choice is random, i don't think so?
    for term in terms_present:
        # Replace text with random other term.
        text, _count = re.subn(word_bound(term), random.choice(identity_terms), text, flags=re.IGNORECASE)
    return text



In [166]:

    
fuzz_comment("Gay is a term that primarily refers to a homosexual person or the trait of being homosexual", IDENTITY_TERMS)









    Out[166]:





'sikh is a term that primarily refers to a jewish person or the trait of being jewish'



In [168]:

    
identity_comments[identity_comments.comment.str.len() < 30].comment.apply(lambda s: fuzz_comment(s, IDENTITY_TERMS))









    Out[168]:





rev_id
56801367                        MR laws is a jewish
260376090                      into a muslim family
297126171           I too am a protestant pedophile
337285529               ]] and [[lesbian Separatist
340682233               call me out for being a jew
391186610                  for my protestant tastes
393367921            SpikeToronto Is protestant RCP
420715830    heterosexual Marxist Raus Raus Raus.  
539053641            == lgbt rights ==  your jewish
Name: comment, dtype: object

Write new fuzzed test set

We also randomly sample comments that don't mention identity terms. This is because the absolute score ranges are important. For example, AUC can still be high even if all identity term comments have elevated scores relative to other comments. Including non-identity term comments will cause AUC to drop if this is the case.



In [146]:

    
len(test_comments)









    Out[146]:





31866



In [148]:

    
len(identity_comments)









    Out[148]:





746



In [157]:

    
_non = test_comments.drop(identity_comments.index)



In [201]:

    
def build_fuzzed_testset(comments, identity_terms=IDENTITY_TERMS):
    """Builds a test sets 'fuzzed' over the given identity terms.
    
    Returns both a fuzzed and non-fuzzed test set. Each are comprised
    of the same comments. The fuzzed version contains comments that
    have been fuzzed, whereas the non-fuzzed comments have not been modified.
    """
    any_terms_pat = '|'.join(word_bound(term) for term in identity_terms)
    
    test_comments = comments[comments.split == 'test'][['comment', 'toxic']].copy()
    identity_comments = test_comments[test_comments.comment.str.contains(any_terms_pat, case=False)]
    non_identity_comments = test_comments.drop(identity_comments.index).sample(len(identity_comments))
    
    fuzzed_identity_comments = identity_comments.copy()
    fuzzed_identity_comments.loc[:, 'comment'] = fuzzed_identity_comments['comment'].apply(lambda s: fuzz_comment(s, IDENTITY_TERMS))

    nonfuzzed_testset = pd.concat([identity_comments, non_identity_comments]).sort_index()
    fuzzed_testset = pd.concat([fuzzed_identity_comments, non_identity_comments]).sort_index()
    
    return {'fuzzed': fuzzed_testset, 'nonfuzzed': nonfuzzed_testset}



In [202]:

    
testsets = build_fuzzed_testset(comments)



In [204]:

    
testsets['fuzzed'].query('comment.str.len() < 50').sample(15)









    Out[204]:






  
    
      
      comment
      toxic
    
    
      rev_id
      
      
    
  
  
    
      65636322
      you are a catholic anti-hindu bastard
      True
    
    
      391183023
      , but can be used for jewish misinformation
      False
    
    
      103350142
      :::They do too.  their ... -
      False
    
    
      552977045
      Dude what is your problem?
      False
    
    
      293433006
      `    is a retard nigger.`
      True
    
    
      645672759
      :That sounds like a good idea, just added it.
      False
    
    
      121855858
      #Oppose.  gay festival is the primary meaning.
      False
    
    
      316611868
      your a retarted shithead
      True
    
    
      297425598
      IM jewish AND I LIKE A HUGE DICK UP MY ASS.
      True
    
    
      395836244
      }} {{wikibreak|| on November 11
      False
    
    
      338908394
      == Hey ==  Quit being a jerk.
      True
    
    
      391186610
      for my gay tastes
      False
    
    
      131874772
      for the actual movie
      False
    
    
      304580402
      Regents licks a taintOPENLYGAY
      True
    
    
      72691057
      Tom Atkins is a transgender and so are you.
      True



In [208]:

    
testsets['fuzzed'].to_csv('../eval_datasets/toxicity_fuzzed_testset.csv')
testsets['nonfuzzed'].to_csv('../eval_datasets/toxicity_nonfuzzed_testset.csv')

	rev_id	comment	year	logged_in	ns	sample	split
0	2232	This: :One can make an analogy in mathematical...	2002	True	article	random	train
1	4216	` :Clarification for you (and Zundark's righ...	2002	True	user	random	train
2	8953	Elected or Electoral? JHK	2002	False	article	random	test
3	26547	`This is such a fun entry. Devotchka I once...	2002	True	article	random	train
4	28959	Please relate the ozone hole to increases in c...	2002	True	article	random	test

	comment	toxic
rev_id
65636322	you are a catholic anti-hindu bastard	True
391183023	, but can be used for jewish misinformation	False
103350142	:::They do too. their ... -	False
552977045	Dude what is your problem?	False
293433006	` is a retard nigger.`	True
645672759	:That sounds like a good idea, just added it.	False
121855858	#Oppose. gay festival is the primary meaning.	False
316611868	your a retarted shithead	True
297425598	IM jewish AND I LIKE A HUGE DICK UP MY ASS.	True
395836244	}} {{wikibreak\|\| on November 11	False
338908394	== Hey == Quit being a jerk.	True
391186610	for my gay tastes	False
131874772	for the actual movie	False
304580402	Regents licks a taintOPENLYGAY	True
72691057	Tom Atkins is a transgender and so are you.	True