Polyglot Unconference

This notebook holds a project conducting data analysis and visualization of the 2017 Polyglot Vancouver Un-Conference.

See the README in this repository for background information.

Session 03 - "Becoming a Senior Dev"

This one was almost as bad as Session 01 - the conversation literally looked like a bicycle wheel, where one person spoke then Speaker 01 spoke, then someone else would speak, then Speaker 01 would respond...

At one point four women got up and left at once... it was pretty brutal. For full disclosure, I even left after 30 minutes, as I just couldn't take it anymore.

Python imports


In [1]:
# Imports

import sys
import pandas as pd
import csv

%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (20.0, 10.0)

In [2]:
# %load util.py
#!/usr/bin/python

# Util file to import in all of the notebooks to allow for easy code re-use


# Calculate Percent of Attendees that did not speak
def percent_silent(df):
    total = len(df)
    silent = 0
    for row in df.iteritems():
        if row[1] == 0:
            silent = silent + 1

    percent = {}
    percent['TOTAL'] = total
    percent['SILENT'] = silent
    percent['VERBOSE'] = total - silent
    return percent

# Calculate Percent of Attendees that left
def percent_left(df):
    total = len(df)
    left = 0
    for row in df.iteritems():
        if row[1] == 0:
            left = left + 1

    percent = {}
    percent['TOTAL'] = total
    percent['LEFT'] = left
    percent['STAYED'] = total - left
    return percent

# Calculate Percent of Attendees along gender
def percent_gender(df):
    total = len(df)
    female = 0
    for row in df.iteritems():
        if row[1] == 1:
            female = female + 1

    percent = {}
    percent['TOTAL'] = total
    percent['FEMALE'] = female
    percent['MALE'] = total - female
    return percent

# Calculate Percent of Talking points by
def percent_talking_gender(df):
    total = 0
    male = 0
    female = 0
    for talks, gender in df.itertuples(index=False):
        if talks > 0:
            total = total + 1
            if gender == 0:
                male = male + 1
            elif gender == 1:
                female = female + 1

    percent = {}
    percent['TOTAL'] = total
    percent['FEMALE'] = female
    percent['MALE'] = male
    return percent

Reading the Data


In [29]:
# Read
data = pd.read_csv('data/3_senior.csv')

# Display
data


Out[29]:
PERSON TALKS GENDER STAYED
0 H1 5 M Y
1 01 23 M Y
2 02 0 M Y
3 03 4 M Y
4 04 1 M Y
5 05 1 M Y
6 06 7 M N
7 07 4 M Y
8 08 1 M N
9 09 2 M Y
10 10 2 M Y
11 11 1 M Y
12 12 3 M Y
13 13 2 M Y
14 14 7 F Y
15 15 1 M Y
16 16 2 M Y
17 17 2 M Y
18 18 1 M Y
19 19 1 M Y
20 20 1 M N
21 21 0 M Y
22 22 0 M Y
23 23 0 M Y
24 24 0 M Y
25 25 0 M Y
26 26 0 M L
27 27 0 M L
28 28 0 M N
29 29 0 M N
30 30 0 M Y
31 31 0 F N
32 32 0 F N
33 33 0 F N
34 34 0 F N
35 35 0 F N
36 36 0 F L

Sanitizing the Data

As we can see, some of our data is stored in a non-numerical format which makes it difficult to perform the maths upon.

Let's clean it up.


In [30]:
# Convert GENDER to Binary (sorry, i know...)

data.loc[data["GENDER"] == "M", "GENDER"] = 0
data.loc[data["GENDER"] == "F", "GENDER"] = 1

# Convert STAYED to 1 and Left/Late to 0

data.loc[data["STAYED"] == "Y", "STAYED"] = 1
data.loc[data["STAYED"] == "N", "STAYED"] = 0
data.loc[data["STAYED"] == "L", "STAYED"] = 0

# We should now see the data in numeric values
data


Out[30]:
PERSON TALKS GENDER STAYED
0 H1 5 0 1
1 01 23 0 1
2 02 0 0 1
3 03 4 0 1
4 04 1 0 1
5 05 1 0 1
6 06 7 0 0
7 07 4 0 1
8 08 1 0 0
9 09 2 0 1
10 10 2 0 1
11 11 1 0 1
12 12 3 0 1
13 13 2 0 1
14 14 7 1 1
15 15 1 0 1
16 16 2 0 1
17 17 2 0 1
18 18 1 0 1
19 19 1 0 1
20 20 1 0 0
21 21 0 0 1
22 22 0 0 1
23 23 0 0 1
24 24 0 0 1
25 25 0 0 1
26 26 0 0 0
27 27 0 0 0
28 28 0 0 0
29 29 0 0 0
30 30 0 0 1
31 31 0 1 0
32 32 0 1 0
33 33 0 1 0
34 34 0 1 0
35 35 0 1 0
36 36 0 1 0

Analysis and Visualization (V1)

Let's do some really basic passes at the data before we run some mathematical computations on it, just to get a better sense of where it stands at the moment.


In [31]:
# Run Describe to give us some basic Min/Max/Mean/Std values

data.describe()


Out[31]:
TALKS
count 37.000000
mean 1.918919
std 4.030291
min 0.000000
25% 0.000000
50% 1.000000
75% 2.000000
max 23.000000

In [32]:
# Run Value_Counts in order to see some basic grouping by attribute

vc_talks = data['TALKS'].value_counts()
vc_talks


Out[32]:
0     17
1      8
2      5
7      2
4      2
23     1
5      1
3      1
Name: TALKS, dtype: int64

In [33]:
vc_gender = data['GENDER'].value_counts()
vc_gender


Out[33]:
0    30
1     7
Name: GENDER, dtype: int64

In [34]:
vc_stayed = data['STAYED'].value_counts()
vc_stayed


Out[34]:
1    24
0    13
Name: STAYED, dtype: int64

In [35]:
# Now let's do some basic plotting with MatPlotLib

data.plot()


Out[35]:
<matplotlib.axes._subplots.AxesSubplot at 0x11392be80>

In [36]:
data.plot(kind='bar')


Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0x1152f7780>

In [37]:
fig1, ax1 = plt.subplots()
ax1.pie(data['TALKS'], autopct='%1.f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.show()


Analysis and Visualization (V2)

As per the methodology in the first notebook, for the sake of mapping the actual conversational flow amongst the participants, I am going to run these analyses and visualizations again while removing the hosts...


In [38]:
data_hostless = data.drop(data.index[[0]])

In [39]:
data_hostless.head()


Out[39]:
PERSON TALKS GENDER STAYED
1 01 23 0 1
2 02 0 0 1
3 03 4 0 1
4 04 1 0 1
5 05 1 0 1

In [40]:
data_hostless.describe()


Out[40]:
TALKS
count 36.000000
mean 1.833333
std 4.053217
min 0.000000
25% 0.000000
50% 1.000000
75% 2.000000
max 23.000000

In [41]:
dh_vc_talks = data_hostless['TALKS'].value_counts()
dh_vc_talks


Out[41]:
0     17
1      8
2      5
7      2
4      2
23     1
3      1
Name: TALKS, dtype: int64

In [42]:
dh_vc_gender = data_hostless['GENDER'].value_counts()
dh_vc_gender


Out[42]:
0    29
1     7
Name: GENDER, dtype: int64

In [43]:
dh_vc_stayed = data_hostless['STAYED'].value_counts()
dh_vc_stayed


Out[43]:
1    23
0    13
Name: STAYED, dtype: int64

In [44]:
data_hostless.plot()


Out[44]:
<matplotlib.axes._subplots.AxesSubplot at 0x114130400>

In [45]:
data_hostless.plot(kind='bar')


Out[45]:
<matplotlib.axes._subplots.AxesSubplot at 0x1142f8048>

In [46]:
fig1, ax1 = plt.subplots()
ax1.pie(data_hostless['TALKS'], autopct='%1.f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.show()


this is still pretty bad...

Algebraic Analysis

Now lets step into some deeper (but probaby still naive) analysis based off of my rudiemtary understanding of Data Science! :D


In [47]:
# Percentage of attendees that were silent during the talk

silent = percent_silent(data['TALKS'])
silent


Out[47]:
{'SILENT': 17, 'TOTAL': 37, 'VERBOSE': 20}

In [48]:
fig1, ax1 = plt.subplots()

sizes = [silent['SILENT'], silent['VERBOSE']]
labels = 'Silent', 'Talked'
explode = (0.05, 0)

ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.0f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.show()



In [49]:
# Percentage of attendees that left early during the talk

left = percent_left(data['STAYED'])
left


Out[49]:
{'LEFT': 13, 'STAYED': 24, 'TOTAL': 37}

In [50]:
fig1, ax1 = plt.subplots()

sizes = [left['LEFT'], left['STAYED']]
labels = 'Left', 'Stayed'
explode = (0.1, 0)

ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.0f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.show()



In [51]:
# Percentage of attendees that were Male vs. Female (see notes above around methodology)

gender = percent_gender(data['GENDER'])
gender


Out[51]:
{'FEMALE': 7, 'MALE': 30, 'TOTAL': 37}

In [52]:
fig1, ax1 = plt.subplots()

sizes = [gender['FEMALE'], gender['MALE']]
labels = 'Female', 'Male'
explode = (0.1, 0)

ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.0f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.show()



In [53]:
# Calculate Percent of Talking points by GENDER

distribution = percent_talking_gender(data[['TALKS','GENDER']])
distribution


Out[53]:
{'FEMALE': 1, 'MALE': 19, 'TOTAL': 20}

In [54]:
fig1, ax1 = plt.subplots()

sizes = [distribution['FEMALE'], distribution['MALE']]
labels = 'Female Speakers', 'Male Speakers'
explode = (0.1, 0)

ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.0f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.show()


these numbers are damning...


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]: