Ensuring fairness in algorithmically-driven decision-making is important to avoid inadvertent cases of bias and perpetuation of harmful stereotypes. However, modern natural language processing techniques, which learn model parameters based on data, might rely on implicit biases presented in the data to make undesirable stereotypical associations. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. Recent results (1, 2) show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because of their widespread use, as we describe, often tends to amplify these biases.
In the following, we provide step-by-step instructions to demonstrate and quanitfy the biases in word embedding.
In [9]:
# Setup:
# Clone the code repository from https://github.com/tolga-b/debiaswe.git
# mkdir debiaswe_tutorial
# cd debiaswe_tutorial
# git clone https://github.com/tolga-b/debiaswe.git
# To reduce the time of downloading data, we provide as subset of GoogleNews-vectors in the following location:
# https://drive.google.com/file/d/1NH6jcrg8SXbnhpIXRIXF_-KUE7wGxGaG/view?usp=sharing
# For full embeddings:
# Download embeddings at https://github.com/tolga-b/debiaswe and put them on the following directory
# embeddings/GoogleNews-vectors-negative300-hard-debiased.bin
# embeddings/GoogleNews-vectors-negative300.bin
In [1]:
from __future__ import print_function, division
%matplotlib inline
from matplotlib import pyplot as plt
import json
import random
import numpy as np
import debiaswe as dwe
import debiaswe.we as we
from debiaswe.we import WordEmbedding
from debiaswe.data import load_professions
In [2]:
# load google news word2vec
E = WordEmbedding('./embeddings/w2v_gnews_small.txt')
# load professions
professions = load_professions()
profession_words = [p[0] for p in professions]
In [3]:
# gender direction
v_gender = E.diff('she', 'he')
We show that the word embedding model generates gender-streotypical analogy pairs. To generate the analogy pairs, we use the analogy score defined in our paper. This score finds word pairs that are well aligned with gender direction as well as within a short distance from each other to preserve topic consistency.
In [4]:
# analogies gender
a_gender = E.best_analogies_dist_thresh(v_gender)
for (a,b,c) in a_gender:
print(a+"-"+b)
Next, we show that many occupations are unintendedly associated with either male of female by projecting their word vectors onto the gender dimension.
The script will output the profession words sorted with respect to the projection score in the direction of gender.
In [5]:
# profession analysis gender
sp = sorted([(E.v(w).dot(v_gender), w) for w in profession_words])
sp[0:20], sp[-20:]
Out[5]:
In [6]:
names = ["Emily", "Aisha", "Anne", "Keisha", "Jill", "Tamika", "Allison", "Lakisha", "Laurie", "Tanisha", "Sarah",
"Latoya", "Meredith", "Kenya", "Carrie", "Latonya", "Kristen", "Ebony", "Todd", "Rasheed", "Neil", "Tremayne",
"Geoffrey", "Kareem", "Brett", "Darnell", "Brendan", "Tyrone", "Greg", "Hakim", "Matthew", "Jamal", "Jay",
"Leroy", "Brad", "Jermaine"]
names_group1 = [names[2 * i] for i in range(len(names) // 2)]
names_group2 = [names[2 * i + 1] for i in range(len(names) // 2)]
In [7]:
# racial direction
vs = [sum(E.v(w) for w in names) for names in (names_group2, names_group1)]
vs = [v / np.linalg.norm(v) for v in vs]
v_racial = vs[1] - vs[0]
v_racial = v_racial / np.linalg.norm(v_racial)
In [8]:
# racial analogies
a_racial = E.best_analogies_dist_thresh(v_racial)
for (a,b,c) in a_racial:
print(a+"-"+b)
In [9]:
# profession analysis racial
sp = sorted([(E.v(w).dot(v_racial), w) for w in profession_words])
sp[0:20], sp[-20:]
Out[9]:
Repeat Step 2-4 with debiased word embedding.
You can use debiaswe debias function to do the debiasing with word sets of your choosing
You can leave equalize_pairs and gender_specific_words blank when coming up with your own groups. We give an example for the case of gender below for you to warm up.
In [10]:
from debiaswe.debias import debias
In [11]:
# Lets load some gender related word lists to help us with debiasing
with open('./data/definitional_pairs.json', "r") as f:
defs = json.load(f)
print("definitional", defs)
with open('./data/equalize_pairs.json', "r") as f:
equalize_pairs = json.load(f)
with open('./data/gender_specific_seed.json', "r") as f:
gender_specific_words = json.load(f)
print("gender specific", len(gender_specific_words), gender_specific_words[:10])
In [12]:
debias(E, gender_specific_words, defs, equalize_pairs)
In [13]:
# profession analysis gender
sp_debiased = sorted([(E.v(w).dot(v_gender), w) for w in profession_words])
sp_debiased[0:20], sp_debiased[-20:]
Out[13]:
In [14]:
# analogies gender
a_gender_debiased = E.best_analogies_dist_thresh(v_gender)
for (a,b,c) in a_gender_debiased:
print(a+"-"+b)
In [ ]: