One of the main goals of the ‘Yes We Tech’ community is contributing to create an inclusive space where we can celebrate diversity, provide visibility to women-in-tech, and ensure that everybody has an equal chance to learn, share and enjoy technology-related disciplines.
As co-organisers of the event, we have concentrated our efforts in getting more women speakers on board under the assumption that a more diverse panel would enrich the conversation also around technology.
Certainly, we have doubled the number of women giving talks this year, but, is this diversity enough? How can we know that we have succeeded in our goal? and more importantly, what can we learn to create a more diverse event in future editions?
The work that we are sharing here talks about two things: data and people. Both data and people should help us to find out some answers and understand the reasons why.
Let's start with a story about data. Data is pretty simple compared with people. Just take a look at the numbers, the small ones, the ones that better describe what happened in 2016 and 2017 J On The Beach editions.
In [1]:
import pandas as pd
import numpy as np
import scipy as sp
import pygal
import operator
from iplotter import GCPlotter
plotter = GCPlotter()
In [2]:
data2016 = pd.read_csv('../input/small_data_2016.csv')
data2016['Women Rate'] = pd.Series(data2016['Women']*100/data2016['Total'])
data2016['Men Rate'] = pd.Series(data2016['Men']*100/data2016['Total'])
data2016
Out[2]:
This year speakers are 40, few less than last year, while participation have reached the number of 368 people. (Compare the increment of attendees 368 vs 299
In [3]:
data2017 = pd.read_csv('../input/small_data_2017.csv')
data2017['Women Rate'] = pd.Series(data2017['Women']*100/data2017['Total'])
data2017['Men Rate'] = pd.Series(data2017['Men']*100/data2017['Total'])
data2017
Out[3]:
In [4]:
increase = 100 - 299*100.00/368
increase
Out[4]:
It is noticable also, that big data is bigger than ever and this year we have included workshops and a hackathon.
The more the better right? Let's continue because there are more numbers behind those ones. Numbers that will give us some signs of diversity.
When it comes about speakers, this year we have a 27.5% of women speaking to J, compared with a rough 10.4% of the last year.
In [5]:
data = [
['Tribe', 'Women', 'Men', {"role": 'annotation'}],
['2016', data2016['Women Rate'][0], data2016['Men Rate'][0],''],
['2017', data2017['Women Rate'][0], data2017['Men Rate'][0],''],
]
options = {
"title": 'Speakers at JOTB',
"width": 600,
"height": 400,
"legend": {"position": 'top', "maxLines": 3},
"bar": {"groupWidth": '50%'},
"isStacked": "true",
"colors": ['#984e9e', '#ed1c40'],
}
plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)
Out[5]:
However, and this is the worrying thing, the participation of women as attendees has slightly dropped from a not too ambitious 13% to a disappointing 9.8%. So we have an x% more of attendees but zero impact on a wider variaty of people.
In [6]:
data = [
['Tribe', 'Women', 'Men', {"role": 'annotation'}],
['2016', data2016['Women Rate'][1], data2016['Men Rate'][1],''],
['2017', data2017['Women Rate'][1], data2017['Men Rate'][1],''],
]
options = {
"title": 'Attendees at JOTB',
"width": 600,
"height": 400,
"legend": {"position": 'top', "maxLines": 3},
"bar": {"groupWidth": '55%'},
"isStacked": "true",
"colors": ['#984e9e', '#ed1c40'],
}
plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)
Out[6]:
We don’t really know. But we continued looking at the numbers and realised that 30 of the 45 companies that enrolled two or more people didn't include any women on their lists. Meaning a 31% of the mass of attendees. Correlate team size with women percentage to validate if: the smaller the teams are, the less chances to include a women on their lists
In [7]:
companies_team = data2017['Total'][3] + data2017['Total'][4]
mass_represented = pd.Series(data2017['Total'][4]*100/companies_team)
women_represented = pd.Series(100 - mass_represented)
mass_represented
Out[7]:
For us this is not a good sign. Despite the fact that our ability to summon has increased on our monthly meetups (the ones that attempts to create this culture for equality on Málaga), the engagement on other events doesn’t have a big impact.
Again I'm not blaming companies here, because if we try to identify the participation rate of women who are not part of a team, the representation also decreased almost a 50%.
In [8]:
data = [
['Tribe', 'Women', 'Men', {"role": 'annotation'}],
[data2016['Tribe'][2], data2016['Women Rate'][2], data2016['Men Rate'][2],''],
[data2016['Tribe'][3], data2016['Women Rate'][3], data2016['Men Rate'][3],''],
[data2016['Tribe'][5], data2016['Women Rate'][5], data2016['Men Rate'][5],''],
]
options = {
"title": '2016 JOTB Edition',
"width": 600,
"height": 400,
"legend": {"position": 'top', "maxLines": 3},
"bar": {"groupWidth": '55%'},
"isStacked": "true",
"colors": ['#984e9e', '#ed1c40'],
}
plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)
Out[8]:
In [9]:
data = [
['Tribe', 'Women', 'Men', {"role": 'annotation'}],
[data2017['Tribe'][2], data2017['Women Rate'][2], data2017['Men Rate'][2],''],
[data2017['Tribe'][3], data2017['Women Rate'][3], data2017['Men Rate'][3],''],
[data2017['Tribe'][5], data2017['Women Rate'][5], data2017['Men Rate'][5],''],
]
options = {
"title": '2017 JOTB Edition',
"width": 600,
"height": 400,
"legend": {"position": 'top', "maxLines": 3},
"bar": {"groupWidth": '55%'},
"isStacked": "true",
"colors": ['#984e9e', '#ed1c40'],
}
plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)
Out[9]:
Before before blaming anyone or falling to quickly into self-indulgence, there are still more data to play with.
Note aside: the next thing is nothing but an experiment, nothing is categorical or has been made with the intention of offending any body. Like our t-shirt labels says: no programmer have been injured in the creation of the following data game.
The next story talks about people. The people around J, the ones who follow, are followed by, interact with, and create the chances of a more diverse and interesting conference.
It is also a story about the people who organise this conference. Because when we started to plan a conference like this, we did nothing but thinking on what could be interesting for the people who come. In order to get that we used the previous knowledge that we have about cool people who do amazing things with data, and JVM technologies. And this means looking into our own networks and following suggestions of the people we trust.
So if we assume that we are biased by the people around us, we thought it was a good idea to know first how is the network of people around J to see the chances that we have to bring someone different, unusual that can add value to the conference.
For the moment, since this is an experiment that wants to trigger your reaction we will look at J's Twitter account.
Indeed, a real-world network would have a larger amount of numbers and people to look at, but yet a digital social network is about human interactions, conversations and knowledge sharing.
For this experiment we've used sexmachine
python library https://pypi.python.org/pypi/SexMachine/ and the 'Twitter Gender Distribution' project published in github https://github.com/ajdavis/twitter-gender-distribution to find out the gender of a specific twitter acount.
In [10]:
run index.py jotb2018
From the small 50% of J's friends that could be identified with a gender, the distribution woman/men is a 20/80. Friends are the ones who follow and are followed by J.
In [11]:
# Read the file and take some important information
whoisj = pd.read_json('../out/jotb2018.json', orient = 'columns')
people = pd.read_json(whoisj['jotb2018'].to_json())
following_total = whoisj['jotb2018']['friends_count']
followers_total = whoisj['jotb2018']['followers_count']
followers = pd.read_json(people['followers_list'].to_json(), orient = 'index')
following = pd.read_json(people['friends_list'].to_json(), orient = 'index')
whoisj
Out[11]:
In [12]:
# J follows to...
following_total
Out[12]:
In [13]:
# J is followed by...
followers_total
Out[13]:
In [14]:
followers['gender'].value_counts()
Out[14]:
In [15]:
following['gender'].value_counts()
Out[15]:
In [16]:
followers_dist = followers['gender'].value_counts()
genders = followers['gender'].value_counts().keys()
followers_map = pygal.Pie(height=400)
followers_map.title = 'Followers Gender Map'
for i in genders:
followers_map.add(i,followers_dist[i]*100.00/followers_total)
followers_map.render_in_browser()
In [17]:
following_dist = following['gender'].value_counts()
genders = following['gender'].value_counts().keys()
following_map = pygal.Pie(height=400)
following_map.title = 'Following Gender Map'
for i in genders:
following_map.add(i,following_dist[i]*100.00/following_total)
following_map.render_in_browser()
In [18]:
lang_counts = followers['lang'].value_counts()
languages = followers['lang'].value_counts().keys()
followers_dist = followers['gender'].value_counts()
lang_followers_map = pygal.Treemap(height=400)
lang_followers_map.title = 'Followers Language Map'
for i in languages:
lang_followers_map.add(i,lang_counts[i]*100.00/followers_total)
lang_followers_map.render_in_browser()
In [19]:
lang_counts = following['lang'].value_counts()
languages = following['lang'].value_counts().keys()
following_dist = following['gender'].value_counts()
lang_following_map = pygal.Treemap(height=400)
lang_following_map.title = 'Following Language Map'
for i in languages:
lang_following_map.add(i,lang_counts[i]*100.00/following_total)
lang_following_map.render_in_browser()
In [20]:
followers['location'].value_counts()
Out[20]:
In [21]:
following['location'].value_counts()
Out[21]:
In [ ]:
run tweets.py jotb2018 1000
In [ ]:
j_network = pd.read_json('../out/jotb2018_tweets.json', orient = 'index')
In [ ]:
interactions = j_network['gender'].value_counts()
genders = j_network['gender'].value_counts().keys()
j_network_map = pygal.Pie(height=400)
j_network_map.title = 'Interactions Gender Map'
for i in genders:
j_network_map.add(i,interactions[i])
j_network_map.render_in_browser()
In [ ]:
a = j_network['hashtags']
b = j_network['gender']
say_something = [x for x in a if x != []]
tags = []
for y in say_something:
for x in pd.DataFrame(y)[0]:
tags.append(x.lower())
tags_used = pd.DataFrame(tags)[0].value_counts()
tags_keys = pd.DataFrame(tags)[0].value_counts().keys()
tags_map = pygal.Treemap(height=400)
tags_map.title = 'Hashtags Map'
for i in tags_keys:
tags_map.add(i,tags_used[i])
tags_map.render_in_browser()
In [ ]:
pairs = []
for i in j_network['gender'].keys() :
if (j_network['hashtags'][i] != []) :
pairs.append([j_network['hashtags'][i], j_network['gender'][i]])
key_pairs = []
for i,j in pairs:
for x in i:
key_pairs.append((x,j))
key_pairs
key_pair_dist = {x: key_pairs.count(x) for x in key_pairs}
sorted_x = sorted(key_pair_dist.items(), key = operator.itemgetter(1), reverse = True)
sorted_x
This is nothing but an experiment, but it is also a way to avoid resignation. This doesn't need to be like it is. We need to know the people around us. Indeed, the gender, the age, the language are not the important things that matters, but are the things that affect to our unconscious bias. When it comes to organise an event with a strong belief on diversity first step is to know ourselves, fight our biased and then to explore further on our network.
Few lines to credit this work. Thanks M. Carmen Correa to find the time between work and family to collect all these data, coding it in Python and dealing with the Twitter API. Thanks also to Ángela Dini and Gema Sánchez, to keep this project energised and share it with the press and the community. Thanks also to the women who have joined not just once, or twice but many times to Yes We Tech meetups, and for sure thank you for your interest, your support and your time. If I have one credit is just the attempt to organise a space free of the same old-boring-macho thing. Hope you enjoyed it and thank you.
Shared in github https://github.com/YesWeTech/whoIsJ