Who Is J?

Analysing JOTB diversity network

One of the main goals of the ‘Yes We Tech’ community is contributing to create an inclusive space where we can celebrate diversity, provide visibility to women-in-tech, and ensure that everybody has an equal chance to learn, share and enjoy technology-related disciplines.

As co-organisers of the event, we have concentrated our efforts in getting more women speakers on board under the assumption that a more diverse panel would enrich the conversation also around technology.

Certainly, we have doubled the number of women giving talks this year, but, is this diversity enough? How can we know that we have succeeded in our goal? and more importantly, what can we learn to create a more diverse event in future editions?

The work that we are sharing here talks about two things: data and people. Both data and people should help us to find out some answers and understand the reasons why.

Let's start with a story about data. Data is pretty simple compared with people. Just take a look at the numbers, the small ones, the ones that better describe what happened in 2016 and 2017 J On The Beach editions.


In [1]:
import pandas as pd
import numpy as np
import scipy as sp
import pygal
import operator
from iplotter import GCPlotter

plotter = GCPlotter()

Small data analysis

Small data says that last year, our 'J' engaged up to 48 speakers and 299 attendees into this big data thing. I'm not considering here any member of the organisation.


In [2]:
data2016 = pd.read_csv('../input/small_data_2016.csv')
data2016['Women Rate'] = pd.Series(data2016['Women']*100/data2016['Total'])
data2016['Men Rate'] = pd.Series(data2016['Men']*100/data2016['Total'])
data2016


Out[2]:
Tribe Women Men Total Women Rate Men Rate
0 speakers 5 43 48 10.416667 89.583333
1 attendees 39 260 299 13.043478 86.956522
2 independent 8 44 52 15.384615 84.615385
3 company_teams 28 214 242 11.570248 88.429752
4 company_teams_no_women 0 99 99 0.000000 100.000000
5 hackathon 0 0 0 NaN NaN

This year speakers are 40, few less than last year, while participation have reached the number of 368 people. (Compare the increment of attendees 368 vs 299


In [3]:
data2017 = pd.read_csv('../input/small_data_2017.csv')
data2017['Women Rate'] = pd.Series(data2017['Women']*100/data2017['Total'])
data2017['Men Rate'] = pd.Series(data2017['Men']*100/data2017['Total'])
data2017


Out[3]:
Tribe Women Men Total Women Rate Men Rate
0 speakers 11 29 40 27.500000 72.500000
1 attendees 36 332 368 9.782609 90.217391
2 independent 6 65 71 8.450704 91.549296
3 copmany_teams 30 267 297 10.101010 89.898990
4 company_teams_no_women 0 134 134 0.000000 100.000000
5 hackathon 4 21 25 16.000000 84.000000

In [4]:
increase = 100 - 299*100.00/368
increase


Out[4]:
18.75

It is noticable also, that big data is bigger than ever and this year we have included workshops and a hackathon.

The more the better right? Let's continue because there are more numbers behind those ones. Numbers that will give us some signs of diversity.

Diversity

When it comes about speakers, this year we have a 27.5% of women speaking to J, compared with a rough 10.4% of the last year.


In [5]:
data = [
    ['Tribe', 'Women', 'Men', {"role": 'annotation'}],
    ['2016', data2016['Women Rate'][0], data2016['Men Rate'][0],''],
    ['2017', data2017['Women Rate'][0], data2017['Men Rate'][0],''],
]
options = {
    "title": 'Speakers at JOTB',
    "width": 600,
    "height": 400,
    "legend": {"position": 'top', "maxLines": 3},
    "bar": {"groupWidth": '50%'},
    "isStacked": "true",
    "colors": ['#984e9e', '#ed1c40'],
}

plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)


Out[5]:

However, and this is the worrying thing, the participation of women as attendees has slightly dropped from a not too ambitious 13% to a disappointing 9.8%. So we have an x% more of attendees but zero impact on a wider variaty of people.


In [6]:
data = [
    ['Tribe', 'Women', 'Men', {"role": 'annotation'}],
    ['2016', data2016['Women Rate'][1], data2016['Men Rate'][1],''],
    ['2017', data2017['Women Rate'][1], data2017['Men Rate'][1],''],
]
options = {
    "title": 'Attendees at JOTB',
    "width": 600,
    "height": 400,
    "legend": {"position": 'top', "maxLines": 3},
    "bar": {"groupWidth": '55%'},
    "isStacked": "true",
    "colors": ['#984e9e', '#ed1c40'],
}

plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)


Out[6]:

Why this happened?

We don’t really know. But we continued looking at the numbers and realised that 30 of the 45 companies that enrolled two or more people didn't include any women on their lists. Meaning a 31% of the mass of attendees. Correlate team size with women percentage to validate if: the smaller the teams are, the less chances to include a women on their lists


In [7]:
companies_team = data2017['Total'][3] + data2017['Total'][4]
mass_represented = pd.Series(data2017['Total'][4]*100/companies_team)
women_represented = pd.Series(100 - mass_represented)
mass_represented


Out[7]:
0    31
dtype: int64

For us this is not a good sign. Despite the fact that our ability to summon has increased on our monthly meetups (the ones that attempts to create this culture for equality on Málaga), the engagement on other events doesn’t have a big impact.

Again I'm not blaming companies here, because if we try to identify the participation rate of women who are not part of a team, the representation also decreased almost a 50%.


In [8]:
data = [
    ['Tribe', 'Women', 'Men', {"role": 'annotation'}],
    [data2016['Tribe'][2], data2016['Women Rate'][2], data2016['Men Rate'][2],''],
    [data2016['Tribe'][3], data2016['Women Rate'][3], data2016['Men Rate'][3],''],
    [data2016['Tribe'][5], data2016['Women Rate'][5], data2016['Men Rate'][5],''],
]
options = {
    "title": '2016 JOTB Edition',
    "width": 600,
    "height": 400,
    "legend": {"position": 'top', "maxLines": 3},
    "bar": {"groupWidth": '55%'},
    "isStacked": "true",
    "colors": ['#984e9e', '#ed1c40'],
}

plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)


Out[8]:

In [9]:
data = [
    ['Tribe', 'Women', 'Men', {"role": 'annotation'}],
    [data2017['Tribe'][2], data2017['Women Rate'][2], data2017['Men Rate'][2],''],
    [data2017['Tribe'][3], data2017['Women Rate'][3], data2017['Men Rate'][3],''],
    [data2017['Tribe'][5], data2017['Women Rate'][5], data2017['Men Rate'][5],''],
]
options = {
    "title": '2017 JOTB Edition',
    "width": 600,
    "height": 400,
    "legend": {"position": 'top', "maxLines": 3},
    "bar": {"groupWidth": '55%'},
    "isStacked": "true",
    "colors": ['#984e9e', '#ed1c40'],
}

plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)


Out[9]:

Before before blaming anyone or falling to quickly into self-indulgence, there are still more data to play with.

Note aside: the next thing is nothing but an experiment, nothing is categorical or has been made with the intention of offending any body. Like our t-shirt labels says: no programmer have been injured in the creation of the following data game.

Social network analysis

The next story talks about people. The people around J, the ones who follow, are followed by, interact with, and create the chances of a more diverse and interesting conference.

It is also a story about the people who organise this conference. Because when we started to plan a conference like this, we did nothing but thinking on what could be interesting for the people who come. In order to get that we used the previous knowledge that we have about cool people who do amazing things with data, and JVM technologies. And this means looking into our own networks and following suggestions of the people we trust.

So if we assume that we are biased by the people around us, we thought it was a good idea to know first how is the network of people around J to see the chances that we have to bring someone different, unusual that can add value to the conference.

For the moment, since this is an experiment that wants to trigger your reaction we will look at J's Twitter account.

Indeed, a real-world network would have a larger amount of numbers and people to look at, but yet a digital social network is about human interactions, conversations and knowledge sharing.

For this experiment we've used sexmachine python library https://pypi.python.org/pypi/SexMachine/ and the 'Twitter Gender Distribution' project published in github https://github.com/ajdavis/twitter-gender-distribution to find out the gender of a specific twitter acount.


In [10]:
run index.py jotb2018

From the small 50% of J's friends that could be identified with a gender, the distribution woman/men is a 20/80. Friends are the ones who follow and are followed by J.


In [11]:
# Read the file and take some important information
whoisj = pd.read_json('../out/jotb2018.json', orient = 'columns')
people = pd.read_json(whoisj['jotb2018'].to_json())
following_total = whoisj['jotb2018']['friends_count']
followers_total = whoisj['jotb2018']['followers_count']
followers = pd.read_json(people['followers_list'].to_json(), orient = 'index')
following = pd.read_json(people['friends_list'].to_json(), orient = 'index')
whoisj


Out[11]:
jotb2018
favourites_count 2518
female_count 67
female_rate 17%
followers_count 1483
followers_list {u'Angelfirenze': {u'lang': u'en', u'favourite...
friends_count 224
friends_list {u'rgransberger': {u'lang': u'de', u'favourite...
gender undetermined
id 3899375963
lang es
location Málaga, España
male_count 175
male_rate 45%
name J On The Beach
nonbinary_count 1
nonbinary_rate 0%
statuses_count 2143
total_count 388
undefined_count 127
undefined_rate 32%

J follows to...


In [12]:
# J follows to...
following_total


Out[12]:
224

J is followed by...


In [13]:
# J is followed by...
followers_total


Out[13]:
1483

Gender distribution


In [14]:
followers['gender'].value_counts()


Out[14]:
male             101
undetermined      53
female            36
mostly_female      8
mostly_male        2
Name: gender, dtype: int64

In [15]:
following['gender'].value_counts()


Out[15]:
male             77
undetermined     75
female           38
mostly_female     6
mostly_male       3
nonbinary         1
Name: gender, dtype: int64

In [16]:
followers_dist = followers['gender'].value_counts()
genders = followers['gender'].value_counts().keys()

followers_map = pygal.Pie(height=400)
followers_map.title = 'Followers Gender Map'

for i in genders:
    followers_map.add(i,followers_dist[i]*100.00/followers_total)

followers_map.render_in_browser()


file:///tmp/tmpDMFLjP.html

In [17]:
following_dist = following['gender'].value_counts()
genders = following['gender'].value_counts().keys()

following_map = pygal.Pie(height=400)
following_map.title = 'Following Gender Map'

for i in genders:
    following_map.add(i,following_dist[i]*100.00/following_total)

following_map.render_in_browser()


file:///tmp/tmpdyrMnq.html

Language distribution


In [18]:
lang_counts = followers['lang'].value_counts()
languages = followers['lang'].value_counts().keys()

followers_dist = followers['gender'].value_counts()

lang_followers_map = pygal.Treemap(height=400)
lang_followers_map.title = 'Followers Language Map'

for i in languages:
    lang_followers_map.add(i,lang_counts[i]*100.00/followers_total)

lang_followers_map.render_in_browser()


file:///tmp/tmpL0cRo8.html

In [19]:
lang_counts = following['lang'].value_counts()
languages = following['lang'].value_counts().keys()

following_dist = following['gender'].value_counts()

lang_following_map = pygal.Treemap(height=400)
lang_following_map.title = 'Following Language Map'

for i in languages:
    lang_following_map.add(i,lang_counts[i]*100.00/following_total)

lang_following_map.render_in_browser()


file:///tmp/tmpYEUnt2.html

Location distribution


In [20]:
followers['location'].value_counts()


Out[20]:
                                  54
Malaga, Spain                      6
Málaga                             5
Málaga, España                     5
España                             4
Madrid                             4
Spain                              3
Madrid, Spain                      3
London, England                    2
Malaga                             2
London                             2
Bristol, England                   2
Amsterdam                          2
Sevilla                            2
Stockholm, Sweden                  2
Los Angeles, CA                    2
Manchester, England                2
Sweden                             1
Sri Lanka                          1
Costa del Sol (Spain)              1
Cadiz - Spain                      1
Reggio Emilia, Italy               1
Pune                               1
Netherlands                        1
Málaga y Vélez Málaga              1
Utrecht, Nederland                 1
The Netherlands, Hilversum         1
Entre Málaga y República Checa     1
Madrid, Comunidad de Madrid        1
Bengaluru South, India             1
                                  ..
Northern California                1
Marbella                           1
The Netherlands                    1
Palma, España                      1
New York, USA                      1
Comunidad de Madrid, España        1
Valencia (Spain)                   1
Trondheim, Norway                  1
Jaén/Málaga                        1
Málaga & Bristol 🇪🇸🇬🇧              1
CORDOBA, SPAIN                     1
Warsaw, Poland                     1
53.764401,-2.705537                1
Madroñera, España                  1
Montmartre, Francia                1
Milano, Lombardia                  1
Granada, Spain                     1
Malaga, Espagne                    1
Valencia, España                   1
Dublin, Ireland                    1
Bayern, Deutschland                1
The desert                         1
Galicia (Spain)                    1
Chicago                            1
Copenhagen, Denmark                1
The Land of Ooo...                 1
Bruges, Belgium                    1
Entre el techo y el suelo.         1
NEW YORK                           1
Amsterdam, Nederland               1
Name: location, Length: 115, dtype: int64

In [21]:
following['location'].value_counts()


Out[21]:
                                  32
London                             9
San Francisco, CA                  8
Málaga, España                     4
Barcelona, Spain                   3
Málaga                             3
Seattle, WA                        3
London, UK                         2
France                             2
Madrid, Comunidad de Madrid        2
London, England                    2
Málaga, Spain                      2
Global                             2
Madrid                             2
Cambridge, England                 2
Las Vegas, NV                      2
Germany                            2
Switzerland                        2
Austin, TX                         2
Saint Petersburg, Russia           1
Pittsburgh, PA                     1
Seattle | Spain | London           1
Barcelona/Sevilla, Spain           1
San Francisco, California          1
Montreal                           1
San Francisco                      1
60+ cities nationwide              1
Existence                          1
Lexically bound                    1
Spain                              1
                                  ..
Bellevue, WA                       1
St. Louis, MO                      1
The desert                         1
Cambridge, MA                      1
#dotNet                            1
Vienna, Austria                    1
Portland, OR                       1
Berkeley, San Francisco            1
New York                           1
Elche, Spain / Berlin, Germany     1
Amsterdam, Nederland               1
Madrid & Mallorca                  1
Barcelona. Spain                   1
40,42481706,-3,66246654            1
Brooklyn, NY                       1
Seattle, WA, USA                   1
Valencia, Spain                    1
Worldwide                          1
Jerez - Spain                      1
London / Malaga / Makati           1
Chicago                            1
Berlin, Germany                    1
Paris, Ile-de-France               1
London | Leeds | Gibraltar         1
Deutschland                        1
Düsseldorf, Germany                1
Barcelona                          1
home                               1
Paris, France                      1
Sydney, Australia                  1
Name: location, Length: 133, dtype: int64

Tweets analysis


In [ ]:
run tweets.py jotb2018 1000

In [ ]:
j_network = pd.read_json('../out/jotb2018_tweets.json', orient = 'index')

In [ ]:
interactions = j_network['gender'].value_counts()
genders = j_network['gender'].value_counts().keys()

j_network_map = pygal.Pie(height=400)
j_network_map.title = 'Interactions Gender Map'

for i in genders:
    j_network_map.add(i,interactions[i])

j_network_map.render_in_browser()

In [ ]:
a = j_network['hashtags']
b = j_network['gender']

say_something = [x for x in a if x != []]

tags = []

for y in say_something:
    for x in pd.DataFrame(y)[0]:
        tags.append(x.lower())
        
        
tags_used = pd.DataFrame(tags)[0].value_counts()
tags_keys = pd.DataFrame(tags)[0].value_counts().keys()

tags_map = pygal.Treemap(height=400)
tags_map.title = 'Hashtags Map'

for i in tags_keys:
    tags_map.add(i,tags_used[i])

tags_map.render_in_browser()

In [ ]:
pairs = []
for i in j_network['gender'].keys() :
    if (j_network['hashtags'][i] != []) : 
        pairs.append([j_network['hashtags'][i], j_network['gender'][i]]) 

key_pairs = []
for i,j in pairs:
    for x in i:
        key_pairs.append((x,j))

key_pairs
key_pair_dist = {x: key_pairs.count(x) for x in key_pairs}
sorted_x = sorted(key_pair_dist.items(), key = operator.itemgetter(1), reverse = True)
sorted_x

Conclusions

This is nothing but an experiment, but it is also a way to avoid resignation. This doesn't need to be like it is. We need to know the people around us. Indeed, the gender, the age, the language are not the important things that matters, but are the things that affect to our unconscious bias. When it comes to organise an event with a strong belief on diversity first step is to know ourselves, fight our biased and then to explore further on our network.

Credits

Few lines to credit this work. Thanks M. Carmen Correa to find the time between work and family to collect all these data, coding it in Python and dealing with the Twitter API. Thanks also to Ángela Dini and Gema Sánchez, to keep this project energised and share it with the press and the community. Thanks also to the women who have joined not just once, or twice but many times to Yes We Tech meetups, and for sure thank you for your interest, your support and your time. If I have one credit is just the attempt to organise a space free of the same old-boring-macho thing. Hope you enjoyed it and thank you.

Shared in github https://github.com/YesWeTech/whoIsJ