Who Is J?

Analysing JOTB diversity network

One of the main goals of the ‘Yes We Tech’ community is contributing to create an inclusive space where we can celebrate diversity, provide visibility to women-in-tech, and ensure that everybody has an equal chance to learn, share and enjoy technology-related disciplines.

As co-organisers of the event, we have concentrated our efforts in getting more women speakers on board under the assumption that a more diverse panel would enrich the conversation also around technology.

Certainly, we have doubled the number of women giving talks this year, but, is this diversity enough? How can we know that we have succeeded in our goal? and more importantly, what can we learn to create a more diverse event in future editions?

The work that we are sharing here talks about two things: data and people. Both data and people should help us to find out some answers and understand the reasons why.

Let's start with a story about data. Data is pretty simple compared with people. Just take a look at the numbers, the small ones, the ones that better describe what happened in 2016 and 2017 J On The Beach editions.



In [1]:

    
import pandas as pd
import numpy as np
import scipy as sp
import pygal
import operator
from iplotter import GCPlotter

plotter = GCPlotter()

Small data analysis

Small data says that last year, our 'J' engaged up to 48 speakers and 299 attendees into this big data thing. I'm not considering here any member of the organisation.



In [2]:

    
data2016 = pd.read_csv('../input/small_data_2016.csv')
data2016['Women Rate'] = pd.Series(data2016['Women']*100/data2016['Total'])
data2016['Men Rate'] = pd.Series(data2016['Men']*100/data2016['Total'])
data2016









    Out[2]:







  
    
      
      Tribe
      Women
      Men
      Total
      Women Rate
      Men Rate
    
  
  
    
      0
      speakers
      5
      43
      48
      10.416667
      89.583333
    
    
      1
      attendees
      39
      260
      299
      13.043478
      86.956522
    
    
      2
      independent
      8
      44
      52
      15.384615
      84.615385
    
    
      3
      company_teams
      28
      214
      242
      11.570248
      88.429752
    
    
      4
      company_teams_no_women
      0
      99
      99
      0.000000
      100.000000
    
    
      5
      hackathon
      0
      0
      0
      NaN
      NaN

This year speakers are 40, few less than last year, while participation have reached the number of 368 people. (Compare the increment of attendees 368 vs 299



In [3]:

    
data2017 = pd.read_csv('../input/small_data_2017.csv')
data2017['Women Rate'] = pd.Series(data2017['Women']*100/data2017['Total'])
data2017['Men Rate'] = pd.Series(data2017['Men']*100/data2017['Total'])
data2017









    Out[3]:







  
    
      
      Tribe
      Women
      Men
      Total
      Women Rate
      Men Rate
    
  
  
    
      0
      speakers
      11
      29
      40
      27.500000
      72.500000
    
    
      1
      attendees
      36
      332
      368
      9.782609
      90.217391
    
    
      2
      independent
      6
      65
      71
      8.450704
      91.549296
    
    
      3
      copmany_teams
      30
      267
      297
      10.101010
      89.898990
    
    
      4
      company_teams_no_women
      0
      134
      134
      0.000000
      100.000000
    
    
      5
      hackathon
      4
      21
      25
      16.000000
      84.000000



In [4]:

    
increase = 100 - 299*100.00/368
increase









    Out[4]:





18.75

It is noticable also, that big data is bigger than ever and this year we have included workshops and a hackathon.

The more the better right? Let's continue because there are more numbers behind those ones. Numbers that will give us some signs of diversity.

Diversity

When it comes about speakers, this year we have a 27.5% of women speaking to J, compared with a rough 10.4% of the last year.



In [5]:

    
data = [
    ['Tribe', 'Women', 'Men', {"role": 'annotation'}],
    ['2016', data2016['Women Rate'][0], data2016['Men Rate'][0],''],
    ['2017', data2017['Women Rate'][0], data2017['Men Rate'][0],''],
]
options = {
    "title": 'Speakers at JOTB',
    "width": 600,
    "height": 400,
    "legend": {"position": 'top', "maxLines": 3},
    "bar": {"groupWidth": '50%'},
    "isStacked": "true",
    "colors": ['#984e9e', '#ed1c40'],
}

plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)









    Out[5]:

However, and this is the worrying thing, the participation of women as attendees has slightly dropped from a not too ambitious 13% to a disappointing 9.8%. So we have an x% more of attendees but zero impact on a wider variaty of people.



In [6]:

    
data = [
    ['Tribe', 'Women', 'Men', {"role": 'annotation'}],
    ['2016', data2016['Women Rate'][1], data2016['Men Rate'][1],''],
    ['2017', data2017['Women Rate'][1], data2017['Men Rate'][1],''],
]
options = {
    "title": 'Attendees at JOTB',
    "width": 600,
    "height": 400,
    "legend": {"position": 'top', "maxLines": 3},
    "bar": {"groupWidth": '55%'},
    "isStacked": "true",
    "colors": ['#984e9e', '#ed1c40'],
}

plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)









    Out[6]:

Why this happened?

We don’t really know. But we continued looking at the numbers and realised that 30 of the 45 companies that enrolled two or more people didn't include any women on their lists. Meaning a 31% of the mass of attendees. Correlate team size with women percentage to validate if: the smaller the teams are, the less chances to include a women on their lists



In [7]:

    
companies_team = data2017['Total'][3] + data2017['Total'][4]
mass_represented = pd.Series(data2017['Total'][4]*100/companies_team)
women_represented = pd.Series(100 - mass_represented)
mass_represented









    Out[7]:





0    31
dtype: int64

For us this is not a good sign. Despite the fact that our ability to summon has increased on our monthly meetups (the ones that attempts to create this culture for equality on Málaga), the engagement on other events doesn’t have a big impact.

Again I'm not blaming companies here, because if we try to identify the participation rate of women who are not part of a team, the representation also decreased almost a 50%.



In [8]:

    
data = [
    ['Tribe', 'Women', 'Men', {"role": 'annotation'}],
    [data2016['Tribe'][2], data2016['Women Rate'][2], data2016['Men Rate'][2],''],
    [data2016['Tribe'][3], data2016['Women Rate'][3], data2016['Men Rate'][3],''],
    [data2016['Tribe'][5], data2016['Women Rate'][5], data2016['Men Rate'][5],''],
]
options = {
    "title": '2016 JOTB Edition',
    "width": 600,
    "height": 400,
    "legend": {"position": 'top', "maxLines": 3},
    "bar": {"groupWidth": '55%'},
    "isStacked": "true",
    "colors": ['#984e9e', '#ed1c40'],
}

plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)









    Out[8]:



In [9]:

    
data = [
    ['Tribe', 'Women', 'Men', {"role": 'annotation'}],
    [data2017['Tribe'][2], data2017['Women Rate'][2], data2017['Men Rate'][2],''],
    [data2017['Tribe'][3], data2017['Women Rate'][3], data2017['Men Rate'][3],''],
    [data2017['Tribe'][5], data2017['Women Rate'][5], data2017['Men Rate'][5],''],
]
options = {
    "title": '2017 JOTB Edition',
    "width": 600,
    "height": 400,
    "legend": {"position": 'top', "maxLines": 3},
    "bar": {"groupWidth": '55%'},
    "isStacked": "true",
    "colors": ['#984e9e', '#ed1c40'],
}

plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)









    Out[9]:

Before before blaming anyone or falling to quickly into self-indulgence, there are still more data to play with.

Note aside: the next thing is nothing but an experiment, nothing is categorical or has been made with the intention of offending any body. Like our t-shirt labels says: no programmer have been injured in the creation of the following data game.

The next story talks about people. The people around J, the ones who follow, are followed by, interact with, and create the chances of a more diverse and interesting conference.

It is also a story about the people who organise this conference. Because when we started to plan a conference like this, we did nothing but thinking on what could be interesting for the people who come. In order to get that we used the previous knowledge that we have about cool people who do amazing things with data, and JVM technologies. And this means looking into our own networks and following suggestions of the people we trust.

So if we assume that we are biased by the people around us, we thought it was a good idea to know first how is the network of people around J to see the chances that we have to bring someone different, unusual that can add value to the conference.

For the moment, since this is an experiment that wants to trigger your reaction we will look at J's Twitter account.

Indeed, a real-world network would have a larger amount of numbers and people to look at, but yet a digital social network is about human interactions, conversations and knowledge sharing.

For this experiment we've used sexmachine python library https://pypi.python.org/pypi/SexMachine/ and the 'Twitter Gender Distribution' project published in github https://github.com/ajdavis/twitter-gender-distribution to find out the gender of a specific twitter acount.



In [10]:

    
run index.py jotb2018

From the small 50% of J's friends that could be identified with a gender, the distribution woman/men is a 20/80. Friends are the ones who follow and are followed by J.



In [11]:

    
# Read the file and take some important information
whoisj = pd.read_json('../out/jotb2018.json', orient = 'columns')
people = pd.read_json(whoisj['jotb2018'].to_json())
following_total = whoisj['jotb2018']['friends_count']
followers_total = whoisj['jotb2018']['followers_count']
followers = pd.read_json(people['followers_list'].to_json(), orient = 'index')
following = pd.read_json(people['friends_list'].to_json(), orient = 'index')
whoisj









    Out[11]:







  
    
      
      jotb2018
    
  
  
    
      favourites_count
      2518
    
    
      female_count
      67
    
    
      female_rate
      17%
    
    
      followers_count
      1483
    
    
      followers_list
      {u'Angelfirenze': {u'lang': u'en', u'favourite...
    
    
      friends_count
      224
    
    
      friends_list
      {u'rgransberger': {u'lang': u'de', u'favourite...
    
    
      gender
      undetermined
    
    
      id
      3899375963
    
    
      lang
      es
    
    
      location
      Málaga, España
    
    
      male_count
      175
    
    
      male_rate
      45%
    
    
      name
      J On The Beach
    
    
      nonbinary_count
      1
    
    
      nonbinary_rate
      0%
    
    
      statuses_count
      2143
    
    
      total_count
      388
    
    
      undefined_count
      127
    
    
      undefined_rate
      32%

J follows to...



In [12]:

    
# J follows to...
following_total









    Out[12]:





224

J is followed by...



In [13]:

    
# J is followed by...
followers_total









    Out[13]:





1483

Gender distribution



In [14]:

    
followers['gender'].value_counts()









    Out[14]:





male             101
undetermined      53
female            36
mostly_female      8
mostly_male        2
Name: gender, dtype: int64



In [15]:

    
following['gender'].value_counts()









    Out[15]:





male             77
undetermined     75
female           38
mostly_female     6
mostly_male       3
nonbinary         1
Name: gender, dtype: int64



In [16]:

    
followers_dist = followers['gender'].value_counts()
genders = followers['gender'].value_counts().keys()

followers_map = pygal.Pie(height=400)
followers_map.title = 'Followers Gender Map'

for i in genders:
    followers_map.add(i,followers_dist[i]*100.00/followers_total)

followers_map.render_in_browser()









    



file:///tmp/tmpDMFLjP.html



In [17]:

    
following_dist = following['gender'].value_counts()
genders = following['gender'].value_counts().keys()

following_map = pygal.Pie(height=400)
following_map.title = 'Following Gender Map'

for i in genders:
    following_map.add(i,following_dist[i]*100.00/following_total)

following_map.render_in_browser()









    



file:///tmp/tmpdyrMnq.html

Language distribution



In [18]:

    
lang_counts = followers['lang'].value_counts()
languages = followers['lang'].value_counts().keys()

followers_dist = followers['gender'].value_counts()

lang_followers_map = pygal.Treemap(height=400)
lang_followers_map.title = 'Followers Language Map'

for i in languages:
    lang_followers_map.add(i,lang_counts[i]*100.00/followers_total)

lang_followers_map.render_in_browser()









    



file:///tmp/tmpL0cRo8.html



In [19]:

    
lang_counts = following['lang'].value_counts()
languages = following['lang'].value_counts().keys()

following_dist = following['gender'].value_counts()

lang_following_map = pygal.Treemap(height=400)
lang_following_map.title = 'Following Language Map'

for i in languages:
    lang_following_map.add(i,lang_counts[i]*100.00/following_total)

lang_following_map.render_in_browser()









    



file:///tmp/tmpYEUnt2.html

Location distribution



In [20]:

    
followers['location'].value_counts()









    Out[20]:





                                  54
Malaga, Spain                      6
Málaga                             5
Málaga, España                     5
España                             4
Madrid                             4
Spain                              3
Madrid, Spain                      3
London, England                    2
Malaga                             2
London                             2
Bristol, England                   2
Amsterdam                          2
Sevilla                            2
Stockholm, Sweden                  2
Los Angeles, CA                    2
Manchester, England                2
Sweden                             1
Sri Lanka                          1
Costa del Sol (Spain)              1
Cadiz - Spain                      1
Reggio Emilia, Italy               1
Pune                               1
Netherlands                        1
Málaga y Vélez Málaga              1
Utrecht, Nederland                 1
The Netherlands, Hilversum         1
Entre Málaga y República Checa     1
Madrid, Comunidad de Madrid        1
Bengaluru South, India             1
                                  ..
Northern California                1
Marbella                           1
The Netherlands                    1
Palma, España                      1
New York, USA                      1
Comunidad de Madrid, España        1
Valencia (Spain)                   1
Trondheim, Norway                  1
Jaén/Málaga                        1
Málaga & Bristol 🇪🇸🇬🇧              1
CORDOBA, SPAIN                     1
Warsaw, Poland                     1
53.764401,-2.705537                1
Madroñera, España                  1
Montmartre, Francia                1
Milano, Lombardia                  1
Granada, Spain                     1
Malaga, Espagne                    1
Valencia, España                   1
Dublin, Ireland                    1
Bayern, Deutschland                1
The desert                         1
Galicia (Spain)                    1
Chicago                            1
Copenhagen, Denmark                1
The Land of Ooo...                 1
Bruges, Belgium                    1
Entre el techo y el suelo.         1
NEW YORK                           1
Amsterdam, Nederland               1
Name: location, Length: 115, dtype: int64



In [21]:

    
following['location'].value_counts()









    Out[21]:





                                  32
London                             9
San Francisco, CA                  8
Málaga, España                     4
Barcelona, Spain                   3
Málaga                             3
Seattle, WA                        3
London, UK                         2
France                             2
Madrid, Comunidad de Madrid        2
London, England                    2
Málaga, Spain                      2
Global                             2
Madrid                             2
Cambridge, England                 2
Las Vegas, NV                      2
Germany                            2
Switzerland                        2
Austin, TX                         2
Saint Petersburg, Russia           1
Pittsburgh, PA                     1
Seattle | Spain | London           1
Barcelona/Sevilla, Spain           1
San Francisco, California          1
Montreal                           1
San Francisco                      1
60+ cities nationwide              1
Existence                          1
Lexically bound                    1
Spain                              1
                                  ..
Bellevue, WA                       1
St. Louis, MO                      1
The desert                         1
Cambridge, MA                      1
#dotNet                            1
Vienna, Austria                    1
Portland, OR                       1
Berkeley, San Francisco            1
New York                           1
Elche, Spain / Berlin, Germany     1
Amsterdam, Nederland               1
Madrid & Mallorca                  1
Barcelona. Spain                   1
40,42481706,-3,66246654            1
Brooklyn, NY                       1
Seattle, WA, USA                   1
Valencia, Spain                    1
Worldwide                          1
Jerez - Spain                      1
London / Malaga / Makati           1
Chicago                            1
Berlin, Germany                    1
Paris, Ile-de-France               1
London | Leeds | Gibraltar         1
Deutschland                        1
Düsseldorf, Germany                1
Barcelona                          1
home                               1
Paris, France                      1
Sydney, Australia                  1
Name: location, Length: 133, dtype: int64

Tweets analysis



In [ ]:

    
run tweets.py jotb2018 1000



In [ ]:

    
j_network = pd.read_json('../out/jotb2018_tweets.json', orient = 'index')



In [ ]:

    
interactions = j_network['gender'].value_counts()
genders = j_network['gender'].value_counts().keys()

j_network_map = pygal.Pie(height=400)
j_network_map.title = 'Interactions Gender Map'

for i in genders:
    j_network_map.add(i,interactions[i])

j_network_map.render_in_browser()



In [ ]:

    
a = j_network['hashtags']
b = j_network['gender']

say_something = [x for x in a if x != []]

tags = []

for y in say_something:
    for x in pd.DataFrame(y)[0]:
        tags.append(x.lower())
        
        
tags_used = pd.DataFrame(tags)[0].value_counts()
tags_keys = pd.DataFrame(tags)[0].value_counts().keys()

tags_map = pygal.Treemap(height=400)
tags_map.title = 'Hashtags Map'

for i in tags_keys:
    tags_map.add(i,tags_used[i])

tags_map.render_in_browser()



In [ ]:

    
pairs = []
for i in j_network['gender'].keys() :
    if (j_network['hashtags'][i] != []) : 
        pairs.append([j_network['hashtags'][i], j_network['gender'][i]]) 

key_pairs = []
for i,j in pairs:
    for x in i:
        key_pairs.append((x,j))

key_pairs
key_pair_dist = {x: key_pairs.count(x) for x in key_pairs}
sorted_x = sorted(key_pair_dist.items(), key = operator.itemgetter(1), reverse = True)
sorted_x

Conclusions

This is nothing but an experiment, but it is also a way to avoid resignation. This doesn't need to be like it is. We need to know the people around us. Indeed, the gender, the age, the language are not the important things that matters, but are the things that affect to our unconscious bias. When it comes to organise an event with a strong belief on diversity first step is to know ourselves, fight our biased and then to explore further on our network.

Credits

Few lines to credit this work. Thanks M. Carmen Correa to find the time between work and family to collect all these data, coding it in Python and dealing with the Twitter API. Thanks also to Ángela Dini and Gema Sánchez, to keep this project energised and share it with the press and the community. Thanks also to the women who have joined not just once, or twice but many times to Yes We Tech meetups, and for sure thank you for your interest, your support and your time. If I have one credit is just the attempt to organise a space free of the same old-boring-macho thing. Hope you enjoyed it and thank you.

Shared in github https://github.com/YesWeTech/whoIsJ

	Tribe	Women	Men	Total	Women Rate	Men Rate
0	speakers	5	43	48	10.416667	89.583333
1	attendees	39	260	299	13.043478	86.956522
2	independent	8	44	52	15.384615	84.615385
3	company_teams	28	214	242	11.570248	88.429752
4	company_teams_no_women	0	99	99	0.000000	100.000000
5	hackathon	0	0	0	NaN	NaN

	Tribe	Women	Men	Total	Women Rate	Men Rate
0	speakers	11	29	40	27.500000	72.500000
1	attendees	36	332	368	9.782609	90.217391
2	independent	6	65	71	8.450704	91.549296
3	copmany_teams	30	267	297	10.101010	89.898990
4	company_teams_no_women	0	134	134	0.000000	100.000000
5	hackathon	4	21	25	16.000000	84.000000

	jotb2018
favourites_count	2518
female_count	67
female_rate	17%
followers_count	1483
followers_list	{u'Angelfirenze': {u'lang': u'en', u'favourite...
friends_count	224
friends_list	{u'rgransberger': {u'lang': u'de', u'favourite...
gender	undetermined
id	3899375963
lang	es
location	Málaga, España
male_count	175
male_rate	45%
name	J On The Beach
nonbinary_count	1
nonbinary_rate	0%
statuses_count	2143
total_count	388
undefined_count	127
undefined_rate	32%