The data was initially recorded in a Markdown table, so was converted to CSV in order to be used within this notebook.
Here is an example of the original table:
| PERSON | TALKS | GENDER | STAYED | NOTES |
|---|---|---|---|---|
| H1 | 20 | M | Y | |
| 01 | 31 | M | Y | OMFG |
| 02 | 07 | M | Y | |
| 03 | 08 | M | Y | |
| 04 | 05 | M | Y | |
| 05 | 04 | M | Y |
The format consisted of 5 Header sections, as seen in the example above:
Well... you can kinda see how this one drew out the opinionated neckbeards first thing in the morning! ;)
But seriously, this session was so incredibly unbalanced that it just blew my mind. One person in particular injected himself - unasked and unwarranted - into every possible section of the conversation.
About 10 minutes in, when it was becoming painfully obvious that the conversation was being railroaded and the only woman had left, someone suggested that we adopt the fishbowl technique, which helped somewhat. In the original Markdown file you can see the before and after split, but for the purposes of analysis I am only using the summation of the data from the session. I have also dropped the Notes header from the CSV.
In [1]:
# Imports
import sys
import pandas as pd
import csv
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (20.0, 10.0)
In [89]:
# %load util.py
#!/usr/bin/python
# Util file to import in all of the notebooks to allow for easy code re-use
# Calculate Percent of Attendees that did not speak
def percent_silent(df):
total = len(df)
silent = 0
for row in df.iteritems():
if row[1] == 0:
silent = silent + 1
percent = {}
percent['TOTAL'] = total
percent['SILENT'] = silent
percent['VERBOSE'] = total - silent
return percent
# Calculate Percent of Attendees that left
def percent_left(df):
total = len(df)
left = 0
for row in df.iteritems():
if row[1] == 0:
left = left + 1
percent = {}
percent['TOTAL'] = total
percent['LEFT'] = left
percent['STAYED'] = total - left
return percent
# Calculate Percent of Attendees along gender
def percent_gender(df):
total = len(df)
female = 0
for row in df.iteritems():
if row[1] == 1:
female = female + 1
percent = {}
percent['TOTAL'] = total
percent['FEMALE'] = female
percent['MALE'] = total - female
return percent
# Calculate Percent of Talking points by
def percent_talking_gender(df):
total = 0
male = 0
female = 0
for talks, gender in df.itertuples(index=False):
if talks > 0:
total = total + 1
if gender == 0:
male = male + 1
elif gender == 1:
female = female + 1
percent = {}
percent['TOTAL'] = total
percent['FEMALE'] = female
percent['MALE'] = male
return percent
In [3]:
# Read
data = pd.read_csv('data/1_solid.csv')
# Display
data
Out[3]:
In [4]:
# Convert GENDER to Binary (sorry, i know...)
data.loc[data["GENDER"] == "M", "GENDER"] = 0
data.loc[data["GENDER"] == "F", "GENDER"] = 1
# Convert STAYED to 1 and Left/Late to 0
data.loc[data["STAYED"] == "Y", "STAYED"] = 1
data.loc[data["STAYED"] == "N", "STAYED"] = 0
data.loc[data["STAYED"] == "L", "STAYED"] = 0
# We should now see the data in numeric values
data
Out[4]:
In [5]:
# Run Describe to give us some basic Min/Max/Mean/Std values
data.describe()
Out[5]:
In [6]:
# Run Value_Counts in order to see some basic grouping by attribute
vc_talks = data['TALKS'].value_counts()
vc_talks
Out[6]:
In [7]:
vc_gender = data['GENDER'].value_counts()
vc_gender
Out[7]:
In [8]:
vc_stayed = data['STAYED'].value_counts()
vc_stayed
Out[8]:
In [9]:
# Now let's do some basic plotting with MatPlotLib
data.plot()
Out[9]:
In [10]:
data.plot(kind='bar')
Out[10]:
In [11]:
fig1, ax1 = plt.subplots()
ax1.pie(data['TALKS'], autopct='%1.f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.show()
Before we go too much farther, I want to point out that the first Person H1 was the host, and was by default speaking a lot. However, you can clearly see that Person 01 spoke even more than them...
For the sake of mapping the actual conversational flow amongst the participants, I am going to run these analyses and visualizations again while removing the host...
In [12]:
data_hostless = data.drop(data.index[0])
data_hostless.head()
Out[12]:
In [13]:
data_hostless.describe()
Out[13]:
In [14]:
dh_vc_talks = data_hostless['TALKS'].value_counts()
dh_vc_talks
Out[14]:
In [15]:
dh_vc_gender = data_hostless['GENDER'].value_counts()
dh_vc_gender
Out[15]:
In [16]:
dh_vc_stayed = data_hostless['STAYED'].value_counts()
dh_vc_stayed
Out[16]:
In [17]:
data_hostless.plot()
Out[17]:
In [18]:
data_hostless.plot(kind='bar')
Out[18]:
In [19]:
fig1, ax1 = plt.subplots()
ax1.pie(data_hostless['TALKS'], autopct='%1.f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.show()
HOLY SHIT
... just look at that chart above.
One person monopolized essentially 50% of the conversation!!!
In [28]:
# Percentage of attendees that were silent during the talk
silent = percent_silent(data['TALKS'])
silent
Out[28]:
In [29]:
fig1, ax1 = plt.subplots()
sizes = [silent['SILENT'], silent['VERBOSE']]
labels = 'Silent', 'Talked'
explode = (0.05, 0)
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.0f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.show()
In [27]:
# Percentage of attendees that left early during the talk
left = percent_left(data['STAYED'])
left
Out[27]:
In [39]:
fig1, ax1 = plt.subplots()
sizes = [left['LEFT'], left['STAYED']]
labels = 'Left', 'Stayed'
explode = (0.1, 0)
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.0f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.show()
In [38]:
# Percentage of attendees that were Male vs. Female (see notes above around methodology)
gender = percent_gender(data['GENDER'])
gender
Out[38]:
In [40]:
fig1, ax1 = plt.subplots()
sizes = [gender['FEMALE'], gender['MALE']]
labels = 'Female', 'Male'
explode = (0.1, 0)
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.0f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.show()
In [90]:
# Calculate Percent of Talking points by GENDER
distribution = percent_talking_gender(data[['TALKS','GENDER']])
distribution
Out[90]:
In [91]:
fig1, ax1 = plt.subplots()
sizes = [distribution['FEMALE'], distribution['MALE']]
labels = 'Female Speakers', 'Male Speakers'
explode = (0.1, 0)
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.0f%%', shadow=True, startangle=90)
ax1.axis('equal')
plt.show()
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: