Diversity, as a topic, has haunted technology companies and development teams for some time now. Their teams are notoriously white and male, in large part because this reflects the demographic of people who have traditionally been in these academic programs.
However, this lack of diversity has been in the news a lot lately, with major scandals at Google. There are many programs encouraging minority and female participation in STEM programs, and it feels like perhaps the next generation of developers will start to look more like the population of the world.
..Or will it?
In [62]:
import sys # system module
import pandas as pd # data package
import matplotlib.pyplot as plt # graphics module
import datetime as dt # date and time module
import numpy as np # foundation for Pandas
import seaborn.apionly as sns # fancy matplotlib graphics (no styling)
from pandas.io import data, wb # worldbank data
# plotly imports
from plotly.offline import iplot, iplot_mpl # plotting functions
import plotly.graph_objs as go # ditto
import plotly # just to print version and init notebook
import cufflinks as cf # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)
# these lines make our graphics show up in the notebook
%matplotlib inline
plotly.offline.init_notebook_mode()
# check versions (overkill, but why not?)
print('Python version:', sys.version)
print('Pandas version: ', pd.__version__)
print('Plotly version: ', plotly.__version__)
print('Today: ', dt.date.today())
2013 National AP Exams I thought a natural starting point was to ask the question: how are current students buying into the existing education programs, and what do those demographics look like over the years?
It turns out the AP exam for Computer Science only became available a few years ago, but the engagement numbers and related demographic information is freely available.
In [63]:
url = 'http://home.cc.gatech.edu/ice-gt/uploads/556/DetailedStateInfoAP-CS-A-2006-2013-with-PercentBlackAndHIspanicByState-fixed.xlsx'
ap0 = pd.read_excel(url, sheetname=1, header=0)
ap0.shape
Out[63]:
In [64]:
ap = ap0.drop("Unnamed: 2", 1)
In [65]:
ap.head(5)
Out[65]:
U.S. Census Data
Placeholder to import 2012 U.S. Census Data. It looks like they have demographic/state data for people in STEM careers, but the link to the files is leading to a 404 page on their site. I have emailed them to let them know, and hope to have this data next week.
Additional Data
I'm working on finding more raw data on current demographics of developers as a benchmark. I've also discovered this survey, but am still figuring out how to get to the raw data in order to import it.
In [ ]: