Hands-on Tutorial: Quantifying and Reducing Gender Stereotypes in Word Embeddings

Ensuring fairness in algorithmically-driven decision-making is important to avoid inadvertent cases of bias and perpetuation of harmful stereotypes. However, modern natural language processing techniques, which learn model parameters based on data, might rely on implicit biases presented in the data to make undesirable stereotypical associations. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. Recent results (1, 2) show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because of their widespread use, as we describe, often tends to amplify these biases.

In the following, we provide step-by-step instructions to demonstrate and quanitfy the biases in word embedding.



In [9]:

    
# Setup:
# Clone the code repository from https://github.com/tolga-b/debiaswe.git
# mkdir debiaswe_tutorial
# cd debiaswe_tutorial
# git clone https://github.com/tolga-b/debiaswe.git

# To reduce the time of downloading data, we provide as subset of GoogleNews-vectors in the following location:
# https://drive.google.com/file/d/1NH6jcrg8SXbnhpIXRIXF_-KUE7wGxGaG/view?usp=sharing

# For full embeddings:
# Download embeddings at https://github.com/tolga-b/debiaswe and put them on the following directory
# embeddings/GoogleNews-vectors-negative300-hard-debiased.bin
# embeddings/GoogleNews-vectors-negative300.bin



In [1]:

    
from __future__ import print_function, division
%matplotlib inline
from matplotlib import pyplot as plt
import json
import random
import numpy as np

import debiaswe as dwe
import debiaswe.we as we
from debiaswe.we import WordEmbedding
from debiaswe.data import load_professions

Part 1: Gender Bias in Word Embedding

Step 1: Load data

We first load the word embedding trained on a corpus of Google News texts consisting of 3 million English words and terms. The embedding maps each word into a 300-dimension vector.



In [2]:

    
# load google news word2vec
E = WordEmbedding('./embeddings/w2v_gnews_small.txt')

# load professions
professions = load_professions()
profession_words = [p[0] for p in professions]









    



*** Reading data from ./embeddings/w2v_gnews_small.txt
(26423, 300)
26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine
Loaded professions
Format:
word,
definitional female -1.0 -> definitional male 1.0
stereotypical female -1.0 -> stereotypical male 1.0

Step 2: Define gender direction

We define gender direction by the direciton of she - he because they are frequent and do not have fewer alternative word senses (e.g., man can also refer to mankind). In the paper, we discuss alternative approach for defining gender direction (e.g., using PCA).



In [3]:

    
# gender direction
v_gender = E.diff('she', 'he')

Step 3: Generating analogies of "Man: x :: Woman : y"

We show that the word embedding model generates gender-streotypical analogy pairs. To generate the analogy pairs, we use the analogy score defined in our paper. This score finds word pairs that are well aligned with gender direction as well as within a short distance from each other to preserve topic consistency.



In [4]:

    
# analogies gender
a_gender = E.best_analogies_dist_thresh(v_gender)

for (a,b,c) in a_gender:
    print(a+"-"+b)









    



Computing neighbors
Mean: 10.2197328085
Median: 7.0
she-he
herself-himself
her-his
woman-man
daughter-son
businesswoman-businessman
girl-boy
actress-actor
chairwoman-chairman
heroine-hero
mother-father
spokeswoman-spokesman
sister-brother
girls-boys
sisters-brothers
queen-king
niece-nephew
councilwoman-councilman
motherhood-fatherhood
women-men
petite-lanky
ovarian_cancer-prostate_cancer
Anne-John
schoolgirl-schoolboy
granddaughter-grandson
aunt-uncle
matriarch-patriarch
twin_sister-twin_brother
mom-dad
lesbian-gay
husband-younger_brother
gal-dude
lady-gentleman
sorority-fraternity
mothers-fathers
grandmother-grandfather
blouse-shirt
soprano-baritone
queens-kings
Jill-Greg
daughters-sons
grandma-grandpa
volleyball-football
diva-superstar
mommy-kid
Sarah-Matthew
hairdresser-barber
softball-baseball
goddess-god
Aisha-Jamal
waitress-waiter
princess-prince
filly-colt
mare-gelding
ladies-gentlemen
childhood-boyhood
interior_designer-architect
nun-priest
wig-beard
granddaughters-grandsons
girlfriends-buddies
gals-dudes
aunts-uncles
congresswoman-congressman
feminism-conservatism
bitch-bastard
hers-yours
bra-pants
moms-dads
nurse-surgeon
heiress-magnate
feminine-manly
glamorous-flashy
actresses-actors
registered_nurse-physician
cupcakes-pizzas
blond-burly
babe-fella
mums-blokes
gorgeous-magnificent
compatriot-countryman
fabulous-terrific
breast-prostate
starlet-youngster
Laurie-Brett
kids-guys
sewing-carpentry
kinda-guy
headscarf-turban
siblings-elder_brother
charming-affable
sassy-snappy
cosmetics-pharmaceuticals
estrogen-testosterone
handbag-briefcase
housewife-shopkeeper
fillies-colts
nieces-nephews
whore-coward
boyfriend-pal
salon-barbershop
Latonya-Leroy
vagina-penis
breast_cancer-lymphoma
vocalist-guitarist
me-him
children-youngsters
adorable-goofy
giggling-grinning
cheerful-jovial
lovely-brilliant
giggle-chuckle
bras-trousers
wedding_dress-tuxedo
singer-frontman
netball-rugby
rebounder-playmaker
vocalists-trumpeter
nude-shirtless
beautiful-majestic
feisty-mild_mannered
feminists-socialists
nanny-chauffeur
females-males
pediatrician-orthopedic_surgeon
teenage_girls-youths
pink-red
convent-monastery
midwife-doctor
feminist-liberal
gown-blazer
blonde-blond
stepdaughter-stepson
wonderful-great
breasts-genitals
luscious-crisp
judgmental-arrogant
skirts-shorts
middle_aged-bearded
spokespeople-spokesmen
female-male
beauty-grandeur
salesperson-salesman
witch-demon
male_counterparts-counterparts
violinist-virtuoso
practicality-durability
boobs-ass
dolls-replicas
husbands-wives
ponytail-mustache
sexism-racism
mammogram-colonoscopy
sweater-jersey
hysterical-comical
uterus-intestine
rehearse-improvise
classmates-teammates
stroller-bicycle
presenter-broadcaster
friends-buddy
dresses-shirts
eating_disorders-alcoholism
kindness-humility
cute-clever
sobbed-grinned
baking-roasting
sweaters-jerseys
crafting-drafting
nuns-monk
lesbians-homosexual
guidance_counselor-headmaster
terrifying-fearsome
libero-midfielders
classmate-teammate
underclassmen-players
pianist-maestro
boyfriends-friends
teenager-lad
giggles-chuckles
gender-racial
hair_salon-pizzeria
pitcher-starter
lovers-aficionados
lingerie-menswear
kittens-neutered
latex-rubber
dress-garb
alluring-intriguing
buns-sausages
cosmetic_surgery-surgery
foal-stallion
sexy-nerdy
delightful-superb
cubs-lions
self_esteem-morale
rower-skipper
bingo-gambling
figure_skating-hockey
stepmother-eldest_son
swimwear-sportswear
teen-youth
gymnastics-weightlifting
tissues-cartilage
enchanting-splendid
hooker-winger
graphic_designer-carpenter
layups-downfield
sophomores-rookies
singers-guitarists
backcourt-playmakers
hugs-shook_hands
sexist-racist
cheesecake-pizza
sultry-mellow
silicone-polymer
gymnasts-athletes
ribbons-flags
captivating-electrifying
gymnast-athlete
cried-chuckled
ethereal-brooding
cats-pigeons
implants-stents
cougar-wolves
fiance-cousin
cheery-amiable
servicemen-veterans
hair-facial_hair
sophomore-pounder
midwives-doctors
chairperson-managing_director
meter_hurdles-yard_dash
scream-boo
glam-retro
terrified-mad
shrill-louder
lupus-multiple_myeloma
uptight-cocky
hubby-pals
exclaimed-quipped
choreography-footwork
veil-cloak
panties-socks
gowns-robes
designers-architects
foul_trouble-defensively
amazing-unbelievable
artistry-genius
sensual-moody
campers-camp
Allison-Todd
freaking-heck
intuition-gut_feeling
eating_disorder-addiction
frontcourt-swingman
hostess-bartender
child_endangerment-aggravated_assault
masculine-macho
novelist-philosopher
putback-yarder
ex_boyfriend-ex_girlfriend
incredibly-obviously
beautify-rehabilitate
designed-engineered
mares-thoroughbred
satin-leather
hurler-lefthander
bun-burger
heartbreaking-humbling
romantic_comedy-flick
fourteen-eleven
flight_attendant-pilots
cupcake-donut
glitter-confetti
sobbing-smiled
unassisted_goal-powerplay
memoir-autobiography
cigarette-cigar
animal_cruelty-dogfighting
baby-daddy
outfits-uniforms
bride-groom
sprinter-speedster
estranged_husband-stepfather
singer_songwriter-musician
entrepreneurs-businessmen
grad-alumnus
lover-enthusiast
celebrities-superstars
kissing-shaking_hands
auditions-tryout
backstretch-straightaway
pregnancy-gestation
librarian-curator
anime-videogames
heartbroken-dejected
cervical_cancer-pancreatic_cancer
regionals-playoffs
Carrie-Laurie
teenage_girl-teenager
coworkers-colleagues
sergeants-lieutenants
paralegal-accountant
stressful-frustrating
duets-saxophonist
plucky-hapless
realtor-builder
cocktails-beers
closets-lockers
doubleheader-game
thighs-backside
wellness-fitness
giant_slalom-skied
neurotic-eccentric
coordinator-manager
antiques-memorabilia
dancers-drummers
layup-touchdown_pass
manga-comic_books
horrified-disappointed
empowering-motivating
creepy-menacing
insecurities-frustrations
middle_blocker-leadoff_hitter
adored-revered
exercises-drills
skirted-evaded
jewelry-collectibles
choral-composer
confided-intimated
dermatologist-neurosurgeon
attractiveness-competitiveness
housekeeper-janitor
thyroid-inflammation
vocals-drummer
scarf-jacket
burlesque-rock_n_roll
athlete-player
chic-minimalist
cerebral_palsy-spinal_cord_injury
horrid-woeful
transgender-homophobic
clique-inner_circle
pampering-luxurious
associate_professor-professor_emeritus
sexuality-homosexuality
reunite-rejoin
caring-selfless
freshman-redshirt_freshman
plunging-slumping
spokesperson-statement
prostitution-drug_trafficking
characters-villain
therapist-neurologist
maids-servants
mama-ya
delightfully-brilliantly
boutiques-retail_outlets
hormonal-metabolism
puppy-bulldog
pinch_runner-pinch_hitter
behaviors-tendencies
appalled-displeased
forms-form
nonprofit-organization
tamoxifen-statins
mysterious-enigmatic
maternity-obstetrics
mentors-mentor
worker-foreman
anorexia-depression
critters-beasts
feline-rodent
stereotype-mentality
starvation-famine
fairy-magical
gossip-rumor_mill
confesses-concedes
activists-supporters
flower-ornamental
contests-games
procuring-acquiring
silver_medalist-compatriot
purse-wallet
nurses-physicians
sectionals-championship
refill-recharge
inadequate-ineffective
prettiest-finest
seatbelt-wearing_helmet
rehearsal-improvisation
interns-fellows
adds-added
harmonies-guitar
comfy-comfortable
menopause-heart_disease
scripture-disciples
cries-chants
experimenting-tinkering
underweight-underperform
holistic_approach-approach
costumes-props
curves-curve
bartender-bouncer
seductive-mesmerizing
clarinet-trombone
transformative-visionary
ok-alright
unusual-unorthodox
osteoporosis-atrial_fibrillation
screams-roar
incumbents-incumbent
baseman-offensive_lineman
strawberry-potato
steals-tackles
walker-crutches
cook-grill
workshop-meeting
elegance-style
emotionally-mentally
beloved-legendary
breast_milk-sperm
minivan-pickup
complainants-accusers
knitting-yarn
artisans-craftsmen
strollers-scooters
caseload-workload
smile-grin
helpful-useful
caretakers-caretaker
blokes-bloke
provocative-incendiary
ultrasound-x_ray
colorful-flamboyant
counselor-adviser
borrower-loan
brightness-sharpness
staffing-manpower
overwhelmed-frustrated
bowlers-seamer
bunny-monkey
devotion-allegiance
advocates-critics
brides-weddings
cigarettes-cigars
dancer-entertainer
relationships-rapport
attitudes-mindset
strives-wants
bearish-bullish
deadlines-deadline
proponents-detractors
pitchers-ballclub
carriages-locomotive
shortstop-fullback
exhibit-exhibition
credit_crunch-slump
watchers-pundits
hitter-southpaw
scary-dangerous
silver_medals-championships
expiration_date-expires
tenacious-hard_nosed
gosh-yeah
bright_colors-colors
retro-throwback
underpass-tunnel
thrilled-confident
warship-destroyer
resignations-resignation
reaffirmed-reiterated
technical_fouls-ejection
sheets-sheet
makeover-revamp
bookkeeper-treasurer
callous-foolish
dignified-honorable
elegantly-superbly
foods-foodstuffs
cashier-robber
assistant_professor-economics_professor
downsize-restructure
details-specifics
healthful-healthy
luminous-dazzling
charm-swagger
pastry-sausage
free_throws-recovered_fumble
naysayers-doubters
embryo-embryonic
stalker-pedophile
hormones-growth_hormone
irresponsibility-recklessness
intangible_assets-intangibles
descriptive-succinct
screamed-shouted
manipulative-devious
audio_replay-replay

Step 4: Analyzing gender bias in word vectors asscoiated with professions

Next, we show that many occupations are unintendedly associated with either male of female by projecting their word vectors onto the gender dimension.

The script will output the profession words sorted with respect to the projection score in the direction of gender.



In [5]:

    
# profession analysis gender
sp = sorted([(E.v(w).dot(v_gender), w) for w in profession_words])

sp[0:20], sp[-20:]









    Out[5]:





([(-0.23798442, u'maestro'),
  (-0.21665451, u'statesman'),
  (-0.20758669, u'skipper'),
  (-0.20267202, u'protege'),
  (-0.2020676, u'businessman'),
  (-0.19492392, u'sportsman'),
  (-0.18836352, u'philosopher'),
  (-0.1807366, u'marksman'),
  (-0.17289861, u'captain'),
  (-0.16785555, u'architect'),
  (-0.16702037, u'financier'),
  (-0.16313636, u'warrior'),
  (-0.15280862, u'major_leaguer'),
  (-0.15001445, u'trumpeter'),
  (-0.14718868, u'broadcaster'),
  (-0.14637242, u'magician'),
  (-0.14401694, u'fighter_pilot'),
  (-0.13782285, u'boss'),
  (-0.137182, u'industrialist'),
  (-0.13684885, u'pundit')],
 [(0.19714224, u'interior_designer'),
  (0.20833439, u'housekeeper'),
  (0.21560375, u'stylist'),
  (0.22363169, u'bookkeeper'),
  (0.23776126, u'maid'),
  (0.24125955, u'nun'),
  (0.24782579, u'nanny'),
  (0.24929334, u'hairdresser'),
  (0.24946158, u'paralegal'),
  (0.25276464, u'ballerina'),
  (0.25718823, u'socialite'),
  (0.26647124, u'librarian'),
  (0.27317622, u'receptionist'),
  (0.27540293, u'waitress'),
  (0.28085968, u'nurse'),
  (0.30426231, u'registered_nurse'),
  (0.3043797, u'homemaker'),
  (0.34036589, u'housewife'),
  (0.3523514, u'actress'),
  (0.35965404, u'businesswoman')])

Part 2 Racial Bias

Step 5: Define racial direction

We define racial direction based on the common names in different Demographic groups.



In [6]:

    
names = ["Emily", "Aisha", "Anne", "Keisha", "Jill", "Tamika", "Allison", "Lakisha", "Laurie", "Tanisha", "Sarah",
         "Latoya", "Meredith", "Kenya", "Carrie", "Latonya", "Kristen", "Ebony", "Todd", "Rasheed", "Neil", "Tremayne",
         "Geoffrey", "Kareem", "Brett", "Darnell", "Brendan", "Tyrone", "Greg", "Hakim", "Matthew", "Jamal", "Jay",
         "Leroy", "Brad", "Jermaine"]
names_group1 = [names[2 * i] for i in range(len(names) // 2)]
names_group2 = [names[2 * i + 1] for i in range(len(names) // 2)]



In [7]:

    
# racial direction
vs = [sum(E.v(w) for w in names) for names in (names_group2, names_group1)]
vs = [v / np.linalg.norm(v) for v in vs]

v_racial = vs[1] - vs[0]
v_racial = v_racial / np.linalg.norm(v_racial)

Step 6: Generating racial biased analogies

Similar to Step 3, we generate analogies that align with the racial dimension.



In [8]:

    
# racial analogies
a_racial = E.best_analogies_dist_thresh(v_racial)

for (a,b,c) in a_racial:
    print(a+"-"+b)









    



Sarah-Keisha
defensemen-cornerbacks
hipster-hip_hop
punter-cornerback
singer_songwriter-rapper
defenseman-defensive_tackle
pole_vault-triple_jump
musicians-artistes
musician-artiste
catcher-wide_receiver
rock_n_roll-reggae
kicker-kick_returner
tavern-barbershop
freestyle_relay-meter_hurdles
lefthander-swingman
bacon-fried_chicken
artists-rappers
equipment-equipments
hockey-basketball
wool-cotton
unassisted_goal-layup
chocolates-sweets
buddy-cousin
priest-preacher
blue-black
medley_relay-meter_dash
quirky-funky
rabbi-imam
grapes-mango
telecommunications-telecommunication
pitchers-defensive_linemen
passages-verses
er-o
acoustic-soulful
punting-punt_returns
thefts-armed_robbery
bar-nightclub
digs-rebounds
Greg-Geoffrey
cellist-saxophonist
smarts-quickness
puck-halfcourt
quarterback-tailback
fox-leopard
pedophiles-rapists
potatoes-flour
en-el
infrastructure-infrastructural
evangelism-gospel
fiance-aunt
pointers-dunks
baseman-defensive_lineman
pedophile-rapist
joked-smiled
beer-soft_drink
guitarist-singer
election-elections
snuck-sneaked
mobsters-gangster
preventative-preventive
wrestling-boxing
motorbike-taxi_driver
wrestler-boxer
aviation-civil_aviation
grandfather-uncle
walleye-crappie
speculates-opined
evangelical-preachers
wild_pitch-ensuing_kickoff
literate-illiterate
noted-lamented
slams-raps
shop-shopkeeper
heavyweight-welterweight
rider-sprinter
hammered-thrashed
screwed-messed
gymnastics-weightlifting
noteworthy-worth_mentioning
lobsters-crabs
taxpayers-tax_payers
co_ops-cooperatives
currently-presently
carpenter-laborer
corn-rice
heist-robbery
exhilarating-electrifying
wife-mother
affable-jovial
admirable-commendable
wetlands-marshes
families-relatives
cheese-yogurt
infielders-safeties
groin_strain-hamstring_injury
effluent-sewerage
pinpointed-identified
signal_caller-wideout
spooked-scared
uninformed-uneducated
quarterbacks-wide_receivers
lineup-starting_lineup
communist_regime-dictatorship
netminder-striker
condominiums-flats
sexual_misconduct-rape
backline-frontcourt
businesspeople-businessmen
orchestra-singers
elites-rulers
empty_netter-free_throws
ballots-ballot_boxes
shutouts-yards_rushing
soy-palm_oil
masterful-mesmerizing
nationwide-countrywide
input-inputs
euros-rupees
kroner-ringgit
geek-dude
condo-apartment_complex
medalist-bronze_medalist
inn-motel
misplayed-blocked_punt
lobster-shrimp
dads-fathers
Leroy-Latonya
geeks-dudes
chiropractor-doctor
protecting-guarding
urban-slums
boutique-hair_salon
herring-shad
blasting-blaring
public_servants-civil_servants
shortstop-linebacker
strikeouts-sacks
misdemeanor_counts-aggravated_assault
adults-youths
director-chairperson
monk-villager
errors-turnovers
renowned-reputed
argues-complains
Laurie-Carrie
renegade-rebel
staffer-aide
tram-bus
disrepair-dilapidated
swanky-posh
quipped-retorted
consultants-contractors
crime_spree-carjacking
grad-student
felony_counts-attempted_murder
hubris-bravado
simulcast-telecast
racing-drag_racing
professors-lecturers
noting-stressing
priests-clerics
markets-bourses
kickers-defensive_backs
blustery-sweltering
unethical-corrupt
bureaucrat-civil_servant
historian-poet
commonly_referred-popularly_known
studio-recording_studio
stellar-unimpressive
nerdy-chubby
pedophilia-child_molestation
soldiers-policemen
gosh-gonna
conservative-fundamentalist
alpine-mountains
chuckle-smile
chuckling-smiling
summary_judgment-appellant
water_polo-badminton
individuals-persons
pitching-right_hander
bittersweet-joyful
bested-outclassed
mojo-swagger
trey-dunk
straw_poll-primaries
ethanol-sugarcane
grandkids-grandmother
industrialization-poverty_alleviation
thrilled-elated
sexually_abusing-raping
crook-thug
airplane-airliner
airplanes-aircrafts
said-saying
mail-courier
season-preseason
castle-palace
mixed_martial_arts-light_heavyweight
poke-poked
wrestlers-boxers
shacks-slum
staffers-aides
geese-migratory_birds
criminal_mischief-aggravated_robbery
communism-imperialism
whore-pimp
harness_racing-jockey
grandpa-mama
multiple_sclerosis-lupus
punted-yard_punt
bouts-knockouts
designer-fashion_designer
concedes-confesses
homily-sermon
storm-tropical_storm
curriculum-curricula
Jill-Mary
suggested-intimated
filings-periodic_reports
alcoholics-drug_addict
riff-song
dairy-bakery
advises-counsels
maple_syrup-honey
gymnast-dancer
hurler-southpaw
snowmobiles-wheelers
keyboardist-vocalist
towns-localities
sophomore-redshirt_freshman
amphetamines-banned_substance
kayak-kite
ecosystems-coral_reefs
locally_grown-fresh_fruits
sectionals-meter_relay
euro-rupee
herbicides-fertilizers
standout-signee
conception-womb
e_mails-text_messages
cadre-cadres
round_robin-defending_champions
likens-depicts
suggests-implies
partisans-sympathizers
dentist-barber
crooks-gangsters
groundwater-potable_water
skaters-dancers
cello-vocalists
hot_tub-bathtub
angst-bitterness
spat-quarrel
pumpkin-watermelon
sacrifice_bunt-inbounds_pass
monks-villagers
punts-kickoff_returns
ski-giant_slalom
renewable_energy-electricity
skill-athleticism
capitalism-democracy
sweaters-clothes
genius-greatness
homelessness-poverty
infectious_disease-dengue
rural-poverty_stricken
outshot-outrebounded
magician-entertainer
undecided_voters-superdelegates
del-da
lingo-slang
enthusiastic-energetic
ads-promos
referencing-reference
civil_liberties-civil_rights
extremists-militant
cottages-bungalows
downsizing-retrenchment
fascism-colonialism
scoreless-foul_trouble
organist-hymns
disastrous-horrid
diocesan-parish
discipline-indiscipline
perplexed-bewildered
im-thats
thinkers-intellectuals
likened-resembled
classmate-younger_brother
drunken_driver-motorist
explained-clarified
ag-agriculture
beers-rum
fraud-corruption
drunk_driving-reckless_driving
renovation-beautification
floundering-languishing
asparagus-eggplant
wind_farms-hydropower
fireworks-firecrackers
geographic-geographical
cottage-bungalow
enthusiasm-joy
continual-continuous
ice_rink-swimming_pool
bistro-eatery
hopefuls-aspirants
monasteries-temples
explains-informs
vodka-fruit_juice
pretty-alright
admits-confessed
anyhow-dont
wine-liquor
besting-beating
ambiance-ambience
moron-bitch
team-squad
concludes-concluding
pornography-prostitution
egregious-glaring
riffs-rhythms
conflict-bloodshed
boiler-air_conditioner
shots-shooting
superb-scintillating
goalies-offensive_linemen
observes-observed
refining-petrochemicals
voter_fraud-vote_rigging
linkage-linkages
totalitarian-democratic
continually-continuously
enemy_combatants-enemy_combatant
rabbis-imams
curveball-mph_fastball
art-artists
anyway-wont
mean_spirited-racist
astute-skilful
broken_ribs-bruises
yacht-cargo_ship
bandmates-album
screenwriter-film
expats-expatriates
whites-blacks
alcoholism-drug_addiction
bicyclists-pedestrian
bloop_single-putback
says-insists
blond_hair-brown_eyes
pig-chicken
spaghetti-macaroni
denizens-environs
advocacy-non_governmental
pigs-poultry
red-white
economic-socio_economic
wheels-rims
vineyard-orchard
chuckled-cried
assists-rpg
approachable-talkative
persecution-marginalization
goaltending-backcourt
moms-mothers
until-till
bowling-fast_bowler
rotator_cuff-quadriceps
serial_killer-murder
reprehensible-inhuman
wingers-strikers
aptly_named-nicknamed
repaid-disbursed
prose-poetry
hopes-hope
disdain-disrespect
van-minibus
rhymes-rap
c-b
collegiate-college
solitude-tranquility
facial_injuries-gunshot_wounds
winger-playmaker
imploded-crumbled
fiancee-brother
urges-exhorted
halfback-nose_tackle
meth-crack_cocaine
synagogue-mosque
pastoral-spiritual
abuses-brutality
broadband-landline
scrum-flanker
conjecture-speculations
sexual_predators-felons
wine_tasting-barbeque
deli-convenience_store
sanitary-sanitation
goals-points
regionals-semi_finals
sex_offenders-parolees
pseudonym-alias
brilliant-dazzling
trespass-criminal_trespass
soggy-humid
inaugural-inauguration
rabbits-tigers
chauffeur-cab_driver
innings-overs
oil_sands-oilfield
village-locality
complaint_alleges-indictment_alleges
shale_gas-hydrocarbons
raw_milk-milk
anxiety-hopelessness
trolley-taxi
investors-traders
sickened-killed
sexually_abused-raped
constituents-constituency
skier-athlete
heritage-cultural_heritage
cheeses-breads
producers-importers
farms-plantations
dissidents-exiles
merchants-shopkeepers
skiers-athletes
jailing-detaining
cul_de_sac-neighborhood
design-styling
lacrosse-football
department-directorate
piecemeal-haphazard
counterattack-attack
atheist-religious
departments-ministries
reckons-reckon
hypocrisy-racism
spouses-dependents
colleague-comrade
inappropriate-indecent
poker-bingo
thoughtful-gentle
scarves-headscarf
ramifications-repercussions
clerks-cashier
rifle-pistol
travesty-injustice
conservatism-fundamentalism
barbs-taunts
townhouses-storey
trombone-saxophone
chastised-scolded
mused-talked
officer-policeman
flooding-flash_floods
miniscule-paltry
battle-fight
ordained-pastor
folks-fellas
parties-factions
wastewater-sewage
riders-sprinters
postcards-leaflets
venture_capitalist-entrepreneur
headwinds-downside_risks
hotly_contested-contested
meadow-bushes
populist-nationalist
insider_trading-bribery
twice-thrice
mussels-catfish
chocolate-sugar
ahem-uh
ripped-tore
centennial-commemoration
cramp-cramps
winery-bottling
vineyards-orchards
personable-soft_spoken
lavender-fragrant
rib_injury-sprained_ankle
bars-nightclubs
potentially_fatal-fatal
bicycling-bicycle
consists-comprises
pine_trees-palm_trees
terrific-decent
defensively-rebounder
wireless_carriers-telecom_operators
ing-ma
flavorful-spicy
read_aloud-recited
backhander-floater
matches-dayers
incomprehensible-senseless

Step 7: Analyzing racial bias in word vectors asscoiated with professions

Similar to Step 4, we project occpurations onto the racial dimension.



In [9]:

    
# profession analysis racial
sp = sorted([(E.v(w).dot(v_racial), w) for w in profession_words])

sp[0:20], sp[-20:]









    Out[9]:





([(-0.31546241, u'artiste'),
  (-0.27369621, u'shopkeeper'),
  (-0.27285585, u'taxi_driver'),
  (-0.24248751, u'cab_driver'),
  (-0.23096199, u'preacher'),
  (-0.21709056, u'boxer'),
  (-0.20973532, u'laborer'),
  (-0.20361683, u'barber'),
  (-0.1962502, u'cleric'),
  (-0.18273097, u'bodyguard'),
  (-0.18250427, u'gangster'),
  (-0.18162958, u'singer'),
  (-0.16877081, u'maid'),
  (-0.16871038, u'entertainer'),
  (-0.16197535, u'cabbie'),
  (-0.15332885, u'housewife'),
  (-0.14839591, u'civil_servant'),
  (-0.14115772, u'policeman'),
  (-0.13648951, u'minister'),
  (-0.13296556, u'drug_addict')],
 [(0.08779256, u'organist'),
  (0.090074509, u'philanthropist'),
  (0.09135294, u'cinematographer'),
  (0.093180262, u'manager'),
  (0.093583912, u'investment_banker'),
  (0.096878417, u'professor_emeritus'),
  (0.097828984, u'curator'),
  (0.098648623, u'freelance_writer'),
  (0.099171564, u'programmer'),
  (0.10142015, u'screenwriter'),
  (0.10198847, u'author'),
  (0.10438656, u'inventor'),
  (0.1067784, u'adventurer'),
  (0.10964737, u'naturalist'),
  (0.1108993, u'planner'),
  (0.11341722, u'historian'),
  (0.12044048, u'adjunct_professor'),
  (0.13106449, u'director'),
  (0.14140995, u'consultant'),
  (0.14241576, u'architect')])

Exercise

Repeat Step 2-4 with debiased word embedding.

You can use debiaswe debias function to do the debiasing with word sets of your choosing

You can leave equalize_pairs and gender_specific_words blank when coming up with your own groups. We give an example for the case of gender below for you to warm up.



In [10]:

    
from debiaswe.debias import debias



In [11]:

    
# Lets load some gender related word lists to help us with debiasing
with open('./data/definitional_pairs.json', "r") as f:
    defs = json.load(f)
print("definitional", defs)

with open('./data/equalize_pairs.json', "r") as f:
    equalize_pairs = json.load(f)

with open('./data/gender_specific_seed.json', "r") as f:
    gender_specific_words = json.load(f)
print("gender specific", len(gender_specific_words), gender_specific_words[:10])









    



definitional [[u'woman', u'man'], [u'girl', u'boy'], [u'she', u'he'], [u'mother', u'father'], [u'daughter', u'son'], [u'gal', u'guy'], [u'female', u'male'], [u'her', u'his'], [u'herself', u'himself'], [u'Mary', u'John']]
gender specific 218 [u'actress', u'actresses', u'aunt', u'aunts', u'bachelor', u'ballerina', u'barbershop', u'baritone', u'beard', u'beards']



In [12]:

    
debias(E, gender_specific_words, defs, equalize_pairs)









    



26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine
set([(u'Dad', u'Mom'), (u'fathers', u'mothers'), (u'Gelding', u'Mare'), (u'twin_brother', u'twin_sister'), (u'HIMSELF', u'HERSELF'), (u'GRANDSONS', u'GRANDDAUGHTERS'), (u'KING', u'QUEEN'), (u'FRATERNITY', u'SORORITY'), (u'prince', u'princess'), (u'men', u'women'), (u'FATHERHOOD', u'MOTHERHOOD'), (u'Dudes', u'Gals'), (u'DADS', u'MOMS'), (u'BOYS', u'GIRLS'), (u'nephew', u'niece'), (u'Father', u'Mother'), (u'He', u'She'), (u'Grandfather', u'Grandmother'), (u'Spokesman', u'Spokeswoman'), (u'Brother', u'Sister'), (u'FATHERS', u'MOTHERS'), (u'UNCLE', u'AUNT'), (u'gelding', u'mare'), (u'Himself', u'Herself'), (u'his', u'her'), (u'Son', u'Daughter'), (u'prostate_cancer', u'ovarian_cancer'), (u'BROTHER', u'SISTER'), (u'chairman', u'chairwoman'), (u'MEN', u'WOMEN'), (u'gentlemen', u'ladies'), (u'SON', u'DAUGHTER'), (u'king', u'queen'), (u'Colt', u'Filly'), (u'councilman', u'councilwoman'), (u'SPOKESMAN', u'SPOKESWOMAN'), (u'testosterone', u'estrogen'), (u'BOY', u'GIRL'), (u'ex_girlfriend', u'ex_boyfriend'), (u'SCHOOLBOY', u'SCHOOLGIRL'), (u'Boys', u'Girls'), (u'PROSTATE_CANCER', u'OVARIAN_CANCER'), (u'WIVES', u'HUSBANDS'), (u'Chairman', u'Chairwoman'), (u'Men', u'Women'), (u'uncle', u'aunt'), (u'CATHOLIC_PRIEST', u'NUN'), (u'GRANDSON', u'GRANDDAUGHTER'), (u'Grandsons', u'Granddaughters'), (u'monastery', u'convent'), (u'BUSINESSMAN', u'BUSINESSWOMAN'), (u'dad', u'mom'), (u'males', u'females'), (u'TESTOSTERONE', u'ESTROGEN'), (u'Fathers', u'Mothers'), (u'His', u'Her'), (u'MAN', u'WOMAN'), (u'COLT', u'FILLY'), (u'Gentlemen', u'Ladies'), (u'GRANDPA', u'GRANDMA'), (u'Dads', u'Moms'), (u'Boy', u'Girl'), (u'Fella', u'Granny'), (u'dudes', u'gals'), (u'GELDING', u'MARE'), (u'schoolboy', u'schoolgirl'), (u'grandsons', u'granddaughters'), (u'HIS', u'HER'), (u'wives', u'husbands'), (u'fatherhood', u'motherhood'), (u'Testosterone', u'Estrogen'), (u'FELLA', u'GRANNY'), (u'DUDES', u'GALS'), (u'catholic_priest', u'nun'), (u'dads', u'moms'), (u'spokesman', u'spokeswoman'), (u'COUNCILMAN', u'COUNCILWOMAN'), (u'NEPHEW', u'NIECE'), (u'Man', u'Woman'), (u'HE', u'SHE'), (u'Sons', u'Daughters'), (u'fraternity', u'sorority'), (u'fella', u'granny'), (u'colt', u'filly'), (u'Catholic_Priest', u'Nun'), (u'Nephew', u'Niece'), (u'Prostate_Cancer', u'Ovarian_Cancer'), (u'he', u'she'), (u'Wives', u'Husbands'), (u'businessman', u'businesswoman'), (u'DAD', u'MOM'), (u'Males', u'Females'), (u'Grandson', u'Granddaughter'), (u'kings', u'queens'), (u'grandpa', u'grandma'), (u'brothers', u'sisters'), (u'son', u'daughter'), (u'grandfather', u'grandmother'), (u'sons', u'daughters'), (u'Prince', u'Princess'), (u'PRINCE', u'PRINCESS'), (u'EX_GIRLFRIEND', u'EX_BOYFRIEND'), (u'congressman', u'congresswoman'), (u'TWIN_BROTHER', u'TWIN_SISTER'), (u'grandson', u'granddaughter'), (u'FATHER', u'MOTHER'), (u'King', u'Queen'), (u'GRANDFATHER', u'GRANDMOTHER'), (u'SONS', u'DAUGHTERS'), (u'Uncle', u'Aunt'), (u'Kings', u'Queens'), (u'Ex_Girlfriend', u'Ex_Boyfriend'), (u'Brothers', u'Sisters'), (u'Twin_Brother', u'Twin_Sister'), (u'himself', u'herself'), (u'boys', u'girls'), (u'Male', u'Female'), (u'brother', u'sister'), (u'gentleman', u'lady'), (u'Fraternity', u'Sorority'), (u'CHAIRMAN', u'CHAIRWOMAN'), (u'MALE', u'FEMALE'), (u'father', u'mother'), (u'CONGRESSMAN', u'CONGRESSWOMAN'), (u'male', u'female'), (u'MALES', u'FEMALES'), (u'GENTLEMEN', u'LADIES'), (u'Councilman', u'Councilwoman'), (u'man', u'woman'), (u'Gentleman', u'Lady'), (u'Grandpa', u'Grandma'), (u'KINGS', u'QUEENS'), (u'boy', u'girl'), (u'MONASTERY', u'CONVENT'), (u'Fatherhood', u'Motherhood'), (u'BROTHERS', u'SISTERS'), (u'Businessman', u'Businesswoman'), (u'Schoolboy', u'Schoolgirl'), (u'GENTLEMAN', u'LADY'), (u'Monastery', u'Convent'), (u'Congressman', u'Congresswoman')])
26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine



In [13]:

    
# profession analysis gender
sp_debiased = sorted([(E.v(w).dot(v_gender), w) for w in profession_words])

sp_debiased[0:20], sp_debiased[-20:]









    Out[13]:





([(-0.41963255, u'congressman'),
  (-0.40675855, u'businessman'),
  (-0.32398781, u'councilman'),
  (-0.30967095, u'dad'),
  (-0.21665451, u'statesman'),
  (-0.11345412, u'salesman'),
  (-0.073004864, u'monk'),
  (-0.072163954, u'handyman'),
  (-0.049468268, u'minister'),
  (-0.043583866, u'archbishop'),
  (-0.040207233, u'bishop'),
  (-0.038332459, u'commissioner'),
  (-0.035724372, u'surgeon'),
  (-0.033134006, u'trader'),
  (-0.032377187, u'observer'),
  (-0.032095861, u'neurosurgeon'),
  (-0.031450123, u'priest'),
  (-0.031133929, u'skipper'),
  (-0.029659165, u'lawmaker'),
  (-0.029511193, u'commander')],
 [(0.029965654, u'teenager'),
  (0.030237058, u'instructor'),
  (0.030946141, u'student'),
  (0.031116983, u'paralegal'),
  (0.032039441, u'bookkeeper'),
  (0.032434631, u'cinematographer'),
  (0.034329094, u'graphic_designer'),
  (0.034705672, u'lifeguard'),
  (0.035666514, u'janitor'),
  (0.035971936, u'drummer'),
  (0.042120136, u'wrestler'),
  (0.043902256, u'hairdresser'),
  (0.04813318, u'firefighter'),
  (0.23776129, u'maid'),
  (0.24125956, u'nun'),
  (0.25276467, u'ballerina'),
  (0.27540293, u'waitress'),
  (0.34036583, u'housewife'),
  (0.3523514, u'actress'),
  (0.41210851, u'businesswoman')])



In [14]:

    
# analogies gender
a_gender_debiased = E.best_analogies_dist_thresh(v_gender)

for (a,b,c) in a_gender_debiased:
    print(a+"-"+b)









    



Computing neighbors
Mean: 10.2185974341
Median: 7.0
she-he
grandmother-grandfather
daughters-sons
women-men
princess-prince
mothers-fathers
females-males
woman-man
queen-king
councilwoman-councilman
gals-dudes
chairwoman-chairman
convent-monastery
female-male
granddaughters-grandsons
sorority-fraternity
moms-dads
niece-nephew
ovarian_cancer-prostate_cancer
grandma-grandpa
sisters-brothers
twin_sister-twin_brother
sister-brother
congresswoman-congressman
ex_boyfriend-ex_girlfriend
herself-himself
ladies-gentlemen
aunt-uncle
schoolgirl-schoolboy
businesswoman-businessman
motherhood-fatherhood
her-his
queens-kings
mother-father
granddaughter-grandson
spokeswoman-spokesman
mare-gelding
filly-colt
estrogen-testosterone
girls-boys
husbands-wives
daughter-son
girl-boy
mom-dad
actress-actor
lesbian-gay
compatriot-countryman
husband-younger_brother
gal-dude
hers-theirs
heroine-protagonist
feminism-feminist
actresses-actors
childhood-boyhood
waitress-waiter
kid-guy
me-him
mommy-daddy
aunts-uncles
housewife-homemaker
chap-gentleman
mustache-beard
nun-priest
teenage_girls-teenagers
goddess-deity
brides-bridal
diva-superstar
viagra-cialis
fillies-colts
nieces-nephews
vagina-penis
matriarch-patriarch
maternity-midwives
ballerina-dancer
compatriots-countrymen
mama-fella
stepdaughter-stepson
maid-housekeeper
hostess-bartender
boyfriend-girlfriend
witch-witchcraft
heiress-socialite
nuns-monk
teenage_girl-teenager
grandmothers-grandchildren
eldest-elder_brother
uterus-bladder
stepmother-eldest_son
menopause-osteoporosis
bride-wedding
lesbians-homosexuals
youngster-lad
maternal-infant_mortality
estranged_husband-stepfather
dictator-strongman
politician-statesman
maids-laborers
fiance-fiancee
boyfriends-girlfriends
lady-waitress
hubby-babe
blokes-bloke
facial_hair-beards
womb-fetus
businesspeople-businessmen
beau-wed
girlfriends-buddies
witches-fairy
spokespeople-spokesmen
camaraderie-brotherhood
mistress-prostitute
bastard-chap
vocals-baritone
stallion-stud
tumors-prostate
salesperson-salesman
manly-macho
counterparts-brethren
obstetrics-cardiology
widow-cousin
ma-se
hens-chickens
hairdresser-handyman
replied-sir
wife-married
guys-lads
widows-orphans
leopard-lions
hen-fox
hair_salon-barbershop
prepping-ready
bulls-bull
fucking-fellas
salespeople-salesmen
semen-urine
likens-foresees
suitor-hostile_takeover
scream-boo
Jill-Brendan
luckily-but
girlfriend-lover
bipartisan-lawmakers
large-sizeable
suggests-indicated
sentenced-extradited
hallway-room
kinda-anyway
surpluses-surplus
reunite-rejoin
pediatricians-doctors
hallways-corridors
echoed-reiterated
commented-stated
mums-blokes
says-adding
administrations-governments
consummation-consummated
pastor-bishop
reassurances-assurances
sweatshirt-jersey
everything-whatever
buns-biscuits
prevailed-prevailing
riffs-melodies
capabilities-capability
skillfully-smartly
Tamika-Kareem
suffered-sustained
biceps-knee
really-it
muscles-ligaments
because-if
simplifies-improves
explains-elaborated
urges-wants
clauses-clause
monks-priests
esque-la
clutched-smiled
rips-explodes
firefighter-firefighting
freaking-hey
optimal-optimum
flaws-deficiencies
bailouts-bailout
stabbed-raped
reuniting-rejoining
paycheck-salary
sparked-sparking
grand_jury_indicted-grand_jury
admittedly-nevertheless
vegan-vegetarian
explores-explore
incredibly-quite
chiropractor-orthopedic_surgeon
rewarding-worthwhile
researching-exploring
trombone-flute
scorers-scorer
galaxies-comet
masonry-limestone
oral-orally
suspended-suspension
socioeconomic-socio
motivations-intentions
simplify-improve
freaked-mad
militaries-commanders
fatally_shot-abducted
creates-sustains
algorithms-computational
stellar-sparkling
workloads-workload
basic_necessities-potable_water
handgun-rifles
bassist-quintet
donned-don
resigned-resign
dedication-commitment
toasted-toast
thing-do
infinitely-even
commotion-ruckus
reinforces-justifies
waited-wait
screaming-booing
interfaces-interconnect
stabbing-raping
watermelon-mango
ribbons-flags
appoints-elects
divorced-marry
eg-or
rower-skipper
examines-assesses
drill_holes-drilling
relied-depended
steroid-growth_hormone
euro_zone-central_bank
emphasizes-reiterates
keeps-stays
leveraged_buyout-buyout
nontraditional-unconventional
restaurateurs-hoteliers
trash_bin-trash
excessive-excess
user_interface-navigation
magnate-baron
blood_vessels-cartilage
dentist-surgeon
sloppy-woeful
gunfire-artillery
liberal-conservative
culminates-commences
crane-barge
reorganization-restructure
retired-retire
peered-stood
economists_surveyed-repo_rate
depositions-deposition
empires-civilization
Carrie-Mary
instinct-gut_feeling
predicts-expects
recycled_materials-recycled
liberalism-secularism
aftershocks-quakes
carded-par
chronicles-chronicle
creating-generating
credit_cards-overdraft
what-that
stimuli-stimulation
colts-mares
misgivings-reservations
disgusting-obscene
cosmic-earthly
noted-intimated
paid_tribute-saluted
surcharges-fuel_surcharge
toxins-benzene
too-though
classifications-classification
quizzed-queried
dedicated-committed
schedules-timetables
timeline-timeframe
co_authored-published
preferable-feasible
car_bomb_exploded-detonated
looming-imminent
evacuated-evacuate
eerie-enchanting
enormously-certainly
pallets-truckloads
proper-necessary
protests-rebellion
wanna-ll
coral_reefs-coastline
exaggerate-overstated
cast_shadow-tarnish
atherosclerosis-inflammation
eliminates-thereby_reducing
worldview-sensibilities
polar_bear-wolves
saddened-disappointed
budget_shortfalls-shortfall
larvae-mussels
erupted-flared
medications-medicine
skateboard-scooter
ousted-deposed
conjunction-accordance
vocabulary-vernacular
stances-stance
chlorine-sulfur
screwing-screw
coolest-cleanest
leadoff_double-pinch_hitter
rabbit-goat
tentative_agreement-contract
chef-cuisine
maxed-max
wailing-whistling
soy-biofuel
westbound_lanes-eastbound
someplace-somewhere_else
asteroids-asteroid
armies-battalions
resurrection-revival
masterful-scintillating
missteps-setbacks
alderman-councilmen
vying-eyeing
classmates-teammates
appealing-attractive
assailant-abductors
huh-yes
gym-workout
skilful-tactically
vary_depending-depending
shopping_malls-hotels
spiders-snails
retrial-extradition
orgasm-intercourse
creepy-naughty
Laurie-Todd
shortly_afterwards-subsequently
calendars-calendar
navigate-traverse
greeting-warmly
thug-pimp
cellist-soprano
undermines-weakens
carefully-accordingly
slain-kidnapped
joined-join
classes-courses
coworkers-colleagues
associate_dean-vice_chancellor
maneuvered-maneuver
imply-implying
considerate-accommodating
details-specifics
wept-kissed
hedge-hedged
paranoid-reactionary
relationships-partnerships
sunglasses-sandals
distorted-misinterpreted
took-take
demonize-humiliate
worsened-stabilized
combines-blending
clans-tribes
notifying-informing
unwarranted-warranted
culminated-ended
yourselves-them
letters-letter
garnered-garner
greasy-salty
entails-entail
clinched-clinch
junkies-aficionados
tend-generally
poured-pumped
pediatrician-neurosurgeon
organize-mobilize
shoddy-substandard
grappled-pondered
imaginable-conceivable
cruised-cruising
lakhs-upto
templates-custom
reveals-discloses
macroeconomic-monetary_policy
enables-enable
horribly-badly
everywhere-wherever
ones-those
interactions-interacted
kids-people
invariably-whenever
fades-falters
fatally_shooting-abducting
fisherman-fishermen
preparing-prepared
crack_cocaine-cocaine
stockbroker-trader
trillion_yen-billion
snuck-sneaked
cosmetics-cosmetic
diaries-diary
fulfillment-fulfilled
deductions-taxes
specter-possibility
scaring-harming
servicemen-troops
handing-sending
any_damages_arising-reliance_thereon
distressing-alarming
stoic-defiant
romances-romance
darted-dived
designed-intended
offering-offer
coats-stockings
deadlines-deadline
themselves-they
plunge-dip
gasoline-diesel
dictatorships-rulers
thighs-ribs
exacerbate-worsens
gunned_down-ambushed
bankruptcies-defaults
hangs-beckons
southbound_lanes-southbound
size_fits-silver_bullet
prompts-spurs
mortgage-loan
appalling-unacceptable
revealed-hinted
evaluations-assessment
memory_chips-chipmaker
caucuses-delegates
internet-chat_rooms
culminating-subsequent
holidaymakers-tourists
degree_murder-premeditated_murder
sidewalk-intersection
lends-lend
brought-come
compassion-pragmatism
nationwide-countrywide
functionality-connectivity
aversion-preference
confronts-realizes
plugs-plugged
til-until
judgmental-sarcastic
organisms-sediments
explained-added
apples-olives
caps-cap
argues-concedes
refill-recharge
laughed-sarcastically
stockholder-shareholders
cadre-cadres
gosh-just
producer-production
obtaining-acquiring
tub-bath
activation-activating
smoke_inhalation-broken_bones
shirtless-topless
cull-slaughter
upstaged-fazed
read_aloud-aloud
decried-termed
callous-foolish
teddy_bear-doll
bitch-ya
retaliated-retaliate
simplifying-enhancing
founder-managing_director
contests-games
veggies-salad
spurred-spur
netbooks-netbook
spots-spot
oftentimes-seldom
teenagers-youths
coatings-specialty_chemicals
shit-anyhow
wandered-strayed
apartment_complexes-dwellings
fade-subside



In [ ]: