What might the future developer workforce look like?

Morgan White Data Bootcamp, Spring 2016

Introduction

Diversity, as a topic, has haunted technology companies and development teams for some time now. Their teams are notoriously white and male, in large part because this reflects the demographic of people who have traditionally been in these academic programs.

However, this lack of diversity has been in the news a lot lately, with major scandals at Google. There are many programs encouraging minority and female participation in STEM programs, and it feels like perhaps the next generation of developers will start to look more like the population of the world.

..Or will it?



In [62]:

    
import sys                             # system module
import pandas as pd                    # data package
import matplotlib.pyplot as plt        # graphics module  
import datetime as dt                  # date and time module
import numpy as np                     # foundation for Pandas
import seaborn.apionly as sns          # fancy matplotlib graphics (no styling)
from pandas.io import data, wb         # worldbank data

# plotly imports
from plotly.offline import iplot, iplot_mpl  # plotting functions
import plotly.graph_objs as go               # ditto
import plotly                                # just to print version and init notebook
import cufflinks as cf                       # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)

# these lines make our graphics show up in the notebook
%matplotlib inline             
plotly.offline.init_notebook_mode()

# check versions (overkill, but why not?)
print('Python version:', sys.version)
print('Pandas version: ', pd.__version__)
print('Plotly version: ', plotly.__version__)
print('Today: ', dt.date.today())









    



/Users/MorganWhite/anaconda/lib/python3.5/site-packages/pandas/io/data.py:35: FutureWarning: 
The pandas.io.data module is moved to a separate package (pandas-datareader) and will be removed from pandas in a future version.
After installing the pandas-datareader package (https://github.com/pydata/pandas-datareader), you can change the import ``from pandas.io import data, wb`` to ``from pandas_datareader import data, wb``.
  FutureWarning)
/Users/MorganWhite/anaconda/lib/python3.5/site-packages/pandas/io/wb.py:21: FutureWarning: 
The pandas.io.wb module is moved to a separate package (pandas-datareader) and will be removed from pandas in a future version.
After installing the pandas-datareader package (https://github.com/pydata/pandas-datareader), you can change the import ``from pandas.io import data, wb`` to ``from pandas_datareader import data, wb``.
  FutureWarning)






    











    



Python version: 3.5.1 |Anaconda 4.0.0 (x86_64)| (default, Dec  7 2015, 11:24:55) 
[GCC 4.2.1 (Apple Inc. build 5577)]
Pandas version:  0.18.0
Plotly version:  1.9.7
Today:  2016-04-29

Data

2013 National AP Exams I thought a natural starting point was to ask the question: how are current students buying into the existing education programs, and what do those demographics look like over the years?

It turns out the AP exam for Computer Science only became available a few years ago, but the engagement numbers and related demographic information is freely available.



In [63]:

    
url = 'http://home.cc.gatech.edu/ice-gt/uploads/556/DetailedStateInfoAP-CS-A-2006-2013-with-PercentBlackAndHIspanicByState-fixed.xlsx'
ap0 = pd.read_excel(url, sheetname=1, header=0)
ap0.shape









    Out[63]:





(54, 29)



In [64]:

    
ap = ap0.drop("Unnamed: 2", 1)



In [65]:

    
ap.head(5)









    Out[65]:






  
    
      
      2013 data
      # schools
      Total #
      yield per teacher
      # passed
      % passed
      # female
      # female passed
      % female passed
      %female
      ...
      % Black Females passed
      # Hispanic
      # Hispanic passed
      % Hispanic passed
      # Hispanic Females
      # Hispanic Females passed
      % Hispanic females passed
      % hispanic taking exam
      % Hispanic in state
      % taking / % state * 100
    
  
  
    
      0
      California
      211.0
      4964.0
      23.5261
      3761
      75.7655
      1074.0
      776
      72.2533
      21.635778
      ...
      50
      392.0
      186
      47.449
      82.0
      24*
      29.27*
      7.896857
      37.6
      21.002280
    
    
      1
      Texas
      271.0
      3979.0
      14.6827
      2454
      61.6738
      910.0
      520
      57.1429
      22.870068
      ...
      46.6667
      751.0
      334
      44.474
      178.0
      56*
      31.46*
      18.874089
      37.6
      50.197045
    
    
      2
      New York
      124.0
      1858.0
      14.9839
      1278
      68.7836
      377.0
      216
      57.2944
      20.290635
      ...
      10.5263
      150.0
      53
      35.3333
      45.0
      10
      22.2222
      8.073197
      17.6
      45.870437
    
    
      3
      Virginia
      110.0
      1655.0
      15.0455
      1074
      60.3715
      308.0
      207
      67.2078
      18.610272
      ...
      31.25
      90.0
      42
      46.6667
      9.0
      2*
      *
      5.438066
      7.9
      68.836284
    
    
      4
      Maryland
      112.0
      1629.0
      14.5446
      1068
      65.5617
      323.0
      190
      58.8235
      19.828115
      ...
      19.6078
      88.0
      39
      44.3182
      18.0
      6*
      *
      5.402087
      8.2
      65.879112
    
  

5 rows × 28 columns

U.S. Census Data

Placeholder to import 2012 U.S. Census Data. It looks like they have demographic/state data for people in STEM careers, but the link to the files is leading to a 404 page on their site. I have emailed them to let them know, and hope to have this data next week.

Additional Data

I'm working on finding more raw data on current demographics of developers as a benchmark. I've also discovered this survey, but am still figuring out how to get to the raw data in order to import it.



In [ ]:

	2013 data	# schools	Total #	yield per teacher	# passed	% passed	# female	# female passed	% female passed	%female	...	% Black Females passed	# Hispanic	# Hispanic passed	% Hispanic passed	# Hispanic Females	# Hispanic Females passed	% Hispanic females passed	% hispanic taking exam	% Hispanic in state	% taking / % state * 100
0	California	211.0	4964.0	23.5261	3761	75.7655	1074.0	776	72.2533	21.635778	...	50	392.0	186	47.449	82.0	24*	29.27*	7.896857	37.6	21.002280
1	Texas	271.0	3979.0	14.6827	2454	61.6738	910.0	520	57.1429	22.870068	...	46.6667	751.0	334	44.474	178.0	56*	31.46*	18.874089	37.6	50.197045
2	New York	124.0	1858.0	14.9839	1278	68.7836	377.0	216	57.2944	20.290635	...	10.5263	150.0	53	35.3333	45.0	10	22.2222	8.073197	17.6	45.870437
3	Virginia	110.0	1655.0	15.0455	1074	60.3715	308.0	207	67.2078	18.610272	...	31.25	90.0	42	46.6667	9.0	2*	*	5.438066	7.9	68.836284
4	Maryland	112.0	1629.0	14.5446	1068	65.5617	323.0	190	58.8235	19.828115	...	19.6078	88.0	39	44.3182	18.0	6*	*	5.402087	8.2	65.879112