What might the future developer workforce look like?

Morgan White Data Bootcamp, Spring 2016

Introduction

Diversity, as a topic, has haunted technology companies and development teams for some time now. Their teams are notoriously white and male, in large part because this reflects the demographic of people who have traditionally been in these academic programs.

However, this lack of diversity has been in the news a lot lately, with major scandals at Google. There are many programs encouraging minority and female participation in STEM programs, and it feels like perhaps the next generation of developers will start to look more like the population of the world.

..Or will it?


In [62]:
import sys                             # system module
import pandas as pd                    # data package
import matplotlib.pyplot as plt        # graphics module  
import datetime as dt                  # date and time module
import numpy as np                     # foundation for Pandas
import seaborn.apionly as sns          # fancy matplotlib graphics (no styling)
from pandas.io import data, wb         # worldbank data

# plotly imports
from plotly.offline import iplot, iplot_mpl  # plotting functions
import plotly.graph_objs as go               # ditto
import plotly                                # just to print version and init notebook
import cufflinks as cf                       # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)

# these lines make our graphics show up in the notebook
%matplotlib inline             
plotly.offline.init_notebook_mode()

# check versions (overkill, but why not?)
print('Python version:', sys.version)
print('Pandas version: ', pd.__version__)
print('Plotly version: ', plotly.__version__)
print('Today: ', dt.date.today())


/Users/MorganWhite/anaconda/lib/python3.5/site-packages/pandas/io/data.py:35: FutureWarning: 
The pandas.io.data module is moved to a separate package (pandas-datareader) and will be removed from pandas in a future version.
After installing the pandas-datareader package (https://github.com/pydata/pandas-datareader), you can change the import ``from pandas.io import data, wb`` to ``from pandas_datareader import data, wb``.
  FutureWarning)
/Users/MorganWhite/anaconda/lib/python3.5/site-packages/pandas/io/wb.py:21: FutureWarning: 
The pandas.io.wb module is moved to a separate package (pandas-datareader) and will be removed from pandas in a future version.
After installing the pandas-datareader package (https://github.com/pydata/pandas-datareader), you can change the import ``from pandas.io import data, wb`` to ``from pandas_datareader import data, wb``.
  FutureWarning)
Python version: 3.5.1 |Anaconda 4.0.0 (x86_64)| (default, Dec  7 2015, 11:24:55) 
[GCC 4.2.1 (Apple Inc. build 5577)]
Pandas version:  0.18.0
Plotly version:  1.9.7
Today:  2016-04-29

Data

2013 National AP Exams I thought a natural starting point was to ask the question: how are current students buying into the existing education programs, and what do those demographics look like over the years?

It turns out the AP exam for Computer Science only became available a few years ago, but the engagement numbers and related demographic information is freely available.


In [63]:
url = 'http://home.cc.gatech.edu/ice-gt/uploads/556/DetailedStateInfoAP-CS-A-2006-2013-with-PercentBlackAndHIspanicByState-fixed.xlsx'
ap0 = pd.read_excel(url, sheetname=1, header=0)
ap0.shape


Out[63]:
(54, 29)

In [64]:
ap = ap0.drop("Unnamed: 2", 1)

In [65]:
ap.head(5)


Out[65]:
2013 data # schools Total # yield per teacher # passed % passed # female # female passed % female passed %female ... % Black Females passed # Hispanic # Hispanic passed % Hispanic passed # Hispanic Females # Hispanic Females passed % Hispanic females passed % hispanic taking exam % Hispanic in state % taking / % state * 100
0 California 211.0 4964.0 23.5261 3761 75.7655 1074.0 776 72.2533 21.635778 ... 50 392.0 186 47.449 82.0 24* 29.27* 7.896857 37.6 21.002280
1 Texas 271.0 3979.0 14.6827 2454 61.6738 910.0 520 57.1429 22.870068 ... 46.6667 751.0 334 44.474 178.0 56* 31.46* 18.874089 37.6 50.197045
2 New York 124.0 1858.0 14.9839 1278 68.7836 377.0 216 57.2944 20.290635 ... 10.5263 150.0 53 35.3333 45.0 10 22.2222 8.073197 17.6 45.870437
3 Virginia 110.0 1655.0 15.0455 1074 60.3715 308.0 207 67.2078 18.610272 ... 31.25 90.0 42 46.6667 9.0 2* * 5.438066 7.9 68.836284
4 Maryland 112.0 1629.0 14.5446 1068 65.5617 323.0 190 58.8235 19.828115 ... 19.6078 88.0 39 44.3182 18.0 6* * 5.402087 8.2 65.879112

5 rows × 28 columns

U.S. Census Data

Placeholder to import 2012 U.S. Census Data. It looks like they have demographic/state data for people in STEM careers, but the link to the files is leading to a 404 page on their site. I have emailed them to let them know, and hope to have this data next week.

Additional Data

I'm working on finding more raw data on current demographics of developers as a benchmark. I've also discovered this survey, but am still figuring out how to get to the raw data in order to import it.


In [ ]: