High Tech Industries (STEM Concentration)

This notebook uses BLS Industry-Occupation employment data to identify a set of High Tech industries according to the methodology in Hecker (2005). The resulting list, based on the relative concentration of STEM or "technology intensive" occupations. Moreover, these High Tech industries are classified into levels according to the intensity with which they utilize STEM workers.

Heckler, D. (2005). High-technology employment: a NAICS-based update. Monthly Lab. Rev., 128, 57.


In [59]:
# import libraries
import pandas as pd
import numpy as np

# data paths
xwalkPath = ''
blsPath = ''

Import list of 2000 SOC occupations identified in Hecker (2005) as technology intensive. These occupations are then concorded to 2010 SOC codes so that they can be used to identify STEM employment in more recent OES Industry-Occupation data.

The BLS 2000 SOC to 2010 SOC crosswalk can be found here: http://www.bls.gov/soc/


In [60]:
# import list of 'technology intensive' occupations from Hecker (2005), Table 3
stemOcc = pd.read_csv(xwalkPath+'hecker2005_table3.txt')
stemOcc = stemOcc[['occupationcode']]
stemOcc.columns = ['occ00']

# import BLS soc crosswalk, 2000 to 2010
soc0010 = pd.read_csv(xwalkPath+'soc_2000_to_2010_crosswalk.csv')
soc0010 = soc0010[['2000 SOC code','2010 SOC code']]
soc0010.columns = ['occ00','occ10']

# concord Hecker (2005) high tech occupations
stemOcc = pd.merge(stemOcc, soc0010, on='occ00', how='left')
stemOcc = stemOcc[['occ10']]
stemOcc.columns = ['occ']
stemOcc = stemOcc.drop_duplicates()

print 'Count of STEM occupations (2010 SOC): ', len(stemOcc)


Count of STEM occupations (2010 SOC):  74

Import OES BLS Industry-Occupation data for 2012 and 2014. These tables are at the detailed occupation and 4-digit 2012 NAICS industry level. The 2012 data comes split into two files.

BLS OES data can be found here: http://www.bls.gov/oes/tables.htm


In [61]:
# import 2012 OES data
oes2012 = pd.read_csv(blsPath+'nat4d_M2012_dl_1_113300_517100.csv')
oes2012 = oes2012.append(pd.read_csv(blsPath+'nat4d_M2012_dl_2_517200_999300.csv'))
# keep only detail level records, dropping summary and aggregate records
oes2012 = oes2012[(oes2012.OCC_GROUP=='detailed') & (oes2012.OCC_CODE!='00-0000')]
oes2012 = oes2012.reset_index(drop=True)
oes2012 = oes2012[['NAICS','OCC_CODE','TOT_EMP']]
# subset to first 4 digits of naics, dropping zero padding
oes2012['NAICS'] = oes2012['NAICS'].astype(str)
oes2012['NAICS'] = oes2012['NAICS'].str[0:4]
# clean and destring total employment
oes2012['TOT_EMP'] = oes2012['TOT_EMP'].str.replace(' ','')
oes2012['TOT_EMP'] = oes2012['TOT_EMP'].str.replace(' ','')
oes2012['TOT_EMP'] = oes2012['TOT_EMP'].str.replace(',','')
oes2012['TOT_EMP'] = oes2012['TOT_EMP'].str.replace('\*\*','')
oes2012['TOT_EMP'] = pd.to_numeric(oes2012['TOT_EMP'])
oes2012.columns = ['naics', 'occ', 'tot_emp']

# import 2014 OES data
oes2014 = pd.read_csv(blsPath+'nat4d_M2014_dl.csv')
# keep only detail level records, dropping summary and aggregate records
oes2014 = oes2014[(oes2014.OCC_GROUP=='detailed') & (oes2014.OCC_CODE!='00-0000')]
oes2014 = oes2014.reset_index(drop=True)
oes2014 = oes2014[['NAICS','OCC_CODE','TOT_EMP']]
# subset to first 4 digits of naics, dropping zero padding
oes2014['NAICS'] = oes2014['NAICS'].astype(str)
oes2014['NAICS'] = oes2014['NAICS'].str[0:4]
# clean and destring total employment
oes2014['TOT_EMP'] = oes2014['TOT_EMP'].str.replace(' ','')
oes2014['TOT_EMP'] = oes2014['TOT_EMP'].str.replace(' ','')
oes2014['TOT_EMP'] = oes2014['TOT_EMP'].str.replace(',','')
oes2014['TOT_EMP'] = oes2014['TOT_EMP'].str.replace('\*\*','')
oes2014['TOT_EMP'] = pd.to_numeric(oes2014['TOT_EMP'])
oes2014.columns = ['naics', 'occ', 'tot_emp']

Flag STEM or 'technology oriented' occupations on the OES data. Then calculate the total employment and STEM employment for each industry. Take the ratio of STEM employment to total employment in each industry and find the mean STEM emplyoment ratio across all industries. Finally, implement the cutoff rules for High Tech industries as defined in Hecker (2005): Level I industries have STEM ratio greater than (or equal to) 5 times the average STEM concentration, Level II includes industries with a STEM employment ratio between 3 and 5 times the average, and Level III includes industries with a STEM employment ratio between 2 and 3 times the average.


In [62]:
# flag STEM occupations 2012 OES
oes2012ht = pd.merge(oes2012, stemOcc, on='occ', how='left', indicator=True)
oes2012ht['htocc'] = 0
oes2012ht.loc[oes2012ht._merge=='both','htocc'] = 1
# calculate STEM employment
oes2012ht['htemp'] = oes2012ht.tot_emp * oes2012ht.htocc

# sum emp and STEM emp by industry, calc ratio and average
oes2012ht_gb = oes2012ht[['tot_emp','htemp','naics']].groupby('naics').agg(sum)
oes2012ht_gb['naics']=oes2012ht_gb.index
oes2012ht_gb = oes2012ht_gb.reset_index(drop=True)
oes2012ht_gb['htratio'] = oes2012ht_gb.htemp/oes2012ht_gb.tot_emp
oes2012ht_gb['htratio_mean'] = oes2012ht_gb.htratio.mean()

# flag industry by high tech level
oes2012ht_gb['oes12htlvl'] = ''
oes2012ht_gb.loc[oes2012ht_gb.htratio>=2*oes2012ht_gb.htratio_mean,'oes12htlvl'] = 'Level III'
oes2012ht_gb.loc[oes2012ht_gb.htratio>=3*oes2012ht_gb.htratio_mean,'oes12htlvl'] = 'Level II'
oes2012ht_gb.loc[oes2012ht_gb.htratio>=5*oes2012ht_gb.htratio_mean,'oes12htlvl'] = 'Level I'

# show count of 4-digit industries by level
print '#'*20 +' High Tech Industries - 2012 OES ' + '#'*20
print '\nCount of 4-digit 2012 NAICS by HT Level:'
print oes2012ht_gb.groupby('oes12htlvl').agg([len])['naics']
# list level I industries
print '\nList of Level I HT Industries:'
print list(oes2012ht_gb[oes2012ht_gb.oes12htlvl=='Level I']['naics'])


#################### High Tech Industries - 2012 OES ####################

Count of 4-digit 2012 NAICS by HT Level:
            len
oes12htlvl     
            252
Level I      11
Level II     12
Level III    15

List of Level I HT Industries:
['2111', '3341', '3342', '3344', '3345', '3364', '5112', '5182', '5413', '5415', '5417']

In [63]:
# flag STEM occupations 2014 OES
oes2014ht = pd.merge(oes2014, stemOcc, on='occ', how='left', indicator=True)
oes2014ht['htocc'] = 0
oes2014ht.loc[oes2014ht._merge=='both','htocc'] = 1
# calculate STEM employment
oes2014ht['htemp'] = oes2014ht.tot_emp * oes2014ht.htocc

# sum emp and STEM emp by industry, calc ratio and average
oes2014ht_gb = oes2014ht[['tot_emp','htemp','naics']].groupby('naics').agg(sum)
oes2014ht_gb['naics']=oes2014ht_gb.index
oes2014ht_gb = oes2014ht_gb.reset_index(drop=True)
oes2014ht_gb['htratio'] = oes2014ht_gb.htemp/oes2014ht_gb.tot_emp
oes2014ht_gb['htratio_mean'] = oes2014ht_gb.htratio.mean()

# flag industry by high tech level
oes2014ht_gb['oes14htlvl'] = ''
oes2014ht_gb.loc[oes2014ht_gb.htratio>=2*oes2014ht_gb.htratio_mean,'oes14htlvl'] = 'Level III'
oes2014ht_gb.loc[oes2014ht_gb.htratio>=3*oes2014ht_gb.htratio_mean,'oes14htlvl'] = 'Level II'
oes2014ht_gb.loc[oes2014ht_gb.htratio>=5*oes2014ht_gb.htratio_mean,'oes14htlvl'] = 'Level I'

# show count of 4-digit industries by level
print '#'*20 +' High Tech Industries - 2014 OES ' + '#'*20
print '\nCount of 4-digit 2012 NAICS by HT Level:'
print oes2014ht_gb.groupby('oes14htlvl').agg([len])['naics']
# list level I industries
print '\nList of Level I HT Industries:'
print list(oes2014ht_gb[oes2014ht_gb.oes14htlvl=='Level I']['naics'])


#################### High Tech Industries - 2014 OES ####################

Count of 4-digit 2012 NAICS by HT Level:
            len
oes14htlvl     
            254
Level I      11
Level II     16
Level III     9

List of Level I HT Industries:
['3341', '3342', '3344', '3345', '3364', '5112', '5182', '5191', '5413', '5415', '5417']

Combine with list of 4-digit 2002 NAICS industries from Hecker (2005) and export to csv. Note that NAICS industries from Hecker (2005) are 2002 NAICS while the 2012 and 2014 OES data use 2012 NAICS. For comparability, these industries would need to be concorded to the same vintage of industry classification.


In [64]:
hecker05 = pd.read_csv(xwalkPath+'hecker2005_table4.txt')[['naics','level']]
hecker05['naics'] = hecker05['naics'].astype(str)
hecker05.columns = ['naics','hkr05htlvl']

outDF = pd.merge(hecker05,oes2012ht_gb[['naics','oes12htlvl']], on='naics', how='outer')
outDF = pd.merge(outDF, oes2014ht_gb[['naics','oes14htlvl']], on='naics', how='outer')
outDF['oes12htlvl'] = outDF.oes12htlvl.fillna(np.nan)
outDF.loc[outDF.oes12htlvl=='','oes12htlvl'] = np.nan
outDF['oes14htlvl'] = outDF.oes14htlvl.fillna(np.nan)
outDF.loc[outDF.oes14htlvl=='','oes14htlvl'] = np.nan
outDF = outDF.dropna(how='all',subset=['hkr05htlvl','oes12htlvl','oes14htlvl'])
outDF = outDF.sort_values(by='naics')
outDF.to_csv(xwalkPath+'ht_stem_industries.csv', index=False)
print outDF


    naics hkr05htlvl oes12htlvl oes14htlvl
24   1131   Level II        NaN        NaN
25   1132   Level II        NaN        NaN
14   2111   Level II    Level I   Level II
51   2122        NaN        NaN  Level III
15   2211   Level II  Level III   Level II
27   3241  Level III  Level III  Level III
16   3251   Level II   Level II   Level II
17   3252   Level II  Level III  Level III
42   3253  Level III        NaN        NaN
0    3254    Level I   Level II   Level II
28   3255  Level III        NaN        NaN
29   3259  Level III        NaN        NaN
18   3332   Level II   Level II   Level II
19   3333   Level II   Level II   Level II
30   3336  Level III  Level III   Level II
31   3339  Level III  Level III  Level III
1    3341    Level I    Level I    Level I
2    3342    Level I    Level I    Level I
20   3343   Level II  Level III   Level II
3    3344    Level I    Level I    Level I
4    3345    Level I    Level I    Level I
21   3346   Level II  Level III   Level II
32   3353  Level III  Level III  Level III
121  3359        NaN  Level III  Level III
5    3364    Level I    Level I    Level I
125  3365        NaN  Level III        NaN
45   3369  Level III        NaN        NaN
22   4234   Level II   Level II  Level III
44   4861  Level III  Level III   Level II
33   4862  Level III  Level III        NaN
46   4869  Level III        NaN        NaN
6    5112    Level I    Level I    Level I
7    5161    Level I        NaN        NaN
34   5171  Level III   Level II   Level II
35   5172  Level III   Level II   Level II
36   5173  Level III        NaN        NaN
37   5174  Level III   Level II   Level II
8    5179    Level I   Level II   Level II
9    5181    Level I        NaN        NaN
10   5182    Level I    Level I    Level I
208  5191        NaN   Level II    Level I
38   5211  Level III   Level II   Level II
43   5232  Level III  Level III        NaN
214  5241        NaN  Level III  Level III
11   5413    Level I    Level I    Level I
12   5415    Level I    Level I    Level I
23   5416   Level II   Level II   Level II
13   5417    Level I    Level I    Level I
39   5511  Level III  Level III  Level III
41   5612  Level III        NaN        NaN
40   8112  Level III        NaN        NaN
26   9999   Level II        NaN        NaN