This notebook uses BLS Industry-Occupation employment data to identify a set of High Tech industries according to the methodology in Hecker (2005). The resulting list, based on the relative concentration of STEM or "technology intensive" occupations. Moreover, these High Tech industries are classified into levels according to the intensity with which they utilize STEM workers.
Heckler, D. (2005). High-technology employment: a NAICS-based update. Monthly Lab. Rev., 128, 57.
In [59]:
# import libraries
import pandas as pd
import numpy as np
# data paths
xwalkPath = ''
blsPath = ''
Import list of 2000 SOC occupations identified in Hecker (2005) as technology intensive. These occupations are then concorded to 2010 SOC codes so that they can be used to identify STEM employment in more recent OES Industry-Occupation data.
The BLS 2000 SOC to 2010 SOC crosswalk can be found here: http://www.bls.gov/soc/
In [60]:
# import list of 'technology intensive' occupations from Hecker (2005), Table 3
stemOcc = pd.read_csv(xwalkPath+'hecker2005_table3.txt')
stemOcc = stemOcc[['occupationcode']]
stemOcc.columns = ['occ00']
# import BLS soc crosswalk, 2000 to 2010
soc0010 = pd.read_csv(xwalkPath+'soc_2000_to_2010_crosswalk.csv')
soc0010 = soc0010[['2000 SOC code','2010 SOC code']]
soc0010.columns = ['occ00','occ10']
# concord Hecker (2005) high tech occupations
stemOcc = pd.merge(stemOcc, soc0010, on='occ00', how='left')
stemOcc = stemOcc[['occ10']]
stemOcc.columns = ['occ']
stemOcc = stemOcc.drop_duplicates()
print 'Count of STEM occupations (2010 SOC): ', len(stemOcc)
Import OES BLS Industry-Occupation data for 2012 and 2014. These tables are at the detailed occupation and 4-digit 2012 NAICS industry level. The 2012 data comes split into two files.
BLS OES data can be found here: http://www.bls.gov/oes/tables.htm
In [61]:
# import 2012 OES data
oes2012 = pd.read_csv(blsPath+'nat4d_M2012_dl_1_113300_517100.csv')
oes2012 = oes2012.append(pd.read_csv(blsPath+'nat4d_M2012_dl_2_517200_999300.csv'))
# keep only detail level records, dropping summary and aggregate records
oes2012 = oes2012[(oes2012.OCC_GROUP=='detailed') & (oes2012.OCC_CODE!='00-0000')]
oes2012 = oes2012.reset_index(drop=True)
oes2012 = oes2012[['NAICS','OCC_CODE','TOT_EMP']]
# subset to first 4 digits of naics, dropping zero padding
oes2012['NAICS'] = oes2012['NAICS'].astype(str)
oes2012['NAICS'] = oes2012['NAICS'].str[0:4]
# clean and destring total employment
oes2012['TOT_EMP'] = oes2012['TOT_EMP'].str.replace(' ','')
oes2012['TOT_EMP'] = oes2012['TOT_EMP'].str.replace(' ','')
oes2012['TOT_EMP'] = oes2012['TOT_EMP'].str.replace(',','')
oes2012['TOT_EMP'] = oes2012['TOT_EMP'].str.replace('\*\*','')
oes2012['TOT_EMP'] = pd.to_numeric(oes2012['TOT_EMP'])
oes2012.columns = ['naics', 'occ', 'tot_emp']
# import 2014 OES data
oes2014 = pd.read_csv(blsPath+'nat4d_M2014_dl.csv')
# keep only detail level records, dropping summary and aggregate records
oes2014 = oes2014[(oes2014.OCC_GROUP=='detailed') & (oes2014.OCC_CODE!='00-0000')]
oes2014 = oes2014.reset_index(drop=True)
oes2014 = oes2014[['NAICS','OCC_CODE','TOT_EMP']]
# subset to first 4 digits of naics, dropping zero padding
oes2014['NAICS'] = oes2014['NAICS'].astype(str)
oes2014['NAICS'] = oes2014['NAICS'].str[0:4]
# clean and destring total employment
oes2014['TOT_EMP'] = oes2014['TOT_EMP'].str.replace(' ','')
oes2014['TOT_EMP'] = oes2014['TOT_EMP'].str.replace(' ','')
oes2014['TOT_EMP'] = oes2014['TOT_EMP'].str.replace(',','')
oes2014['TOT_EMP'] = oes2014['TOT_EMP'].str.replace('\*\*','')
oes2014['TOT_EMP'] = pd.to_numeric(oes2014['TOT_EMP'])
oes2014.columns = ['naics', 'occ', 'tot_emp']
Flag STEM or 'technology oriented' occupations on the OES data. Then calculate the total employment and STEM employment for each industry. Take the ratio of STEM employment to total employment in each industry and find the mean STEM emplyoment ratio across all industries. Finally, implement the cutoff rules for High Tech industries as defined in Hecker (2005): Level I industries have STEM ratio greater than (or equal to) 5 times the average STEM concentration, Level II includes industries with a STEM employment ratio between 3 and 5 times the average, and Level III includes industries with a STEM employment ratio between 2 and 3 times the average.
In [62]:
# flag STEM occupations 2012 OES
oes2012ht = pd.merge(oes2012, stemOcc, on='occ', how='left', indicator=True)
oes2012ht['htocc'] = 0
oes2012ht.loc[oes2012ht._merge=='both','htocc'] = 1
# calculate STEM employment
oes2012ht['htemp'] = oes2012ht.tot_emp * oes2012ht.htocc
# sum emp and STEM emp by industry, calc ratio and average
oes2012ht_gb = oes2012ht[['tot_emp','htemp','naics']].groupby('naics').agg(sum)
oes2012ht_gb['naics']=oes2012ht_gb.index
oes2012ht_gb = oes2012ht_gb.reset_index(drop=True)
oes2012ht_gb['htratio'] = oes2012ht_gb.htemp/oes2012ht_gb.tot_emp
oes2012ht_gb['htratio_mean'] = oes2012ht_gb.htratio.mean()
# flag industry by high tech level
oes2012ht_gb['oes12htlvl'] = ''
oes2012ht_gb.loc[oes2012ht_gb.htratio>=2*oes2012ht_gb.htratio_mean,'oes12htlvl'] = 'Level III'
oes2012ht_gb.loc[oes2012ht_gb.htratio>=3*oes2012ht_gb.htratio_mean,'oes12htlvl'] = 'Level II'
oes2012ht_gb.loc[oes2012ht_gb.htratio>=5*oes2012ht_gb.htratio_mean,'oes12htlvl'] = 'Level I'
# show count of 4-digit industries by level
print '#'*20 +' High Tech Industries - 2012 OES ' + '#'*20
print '\nCount of 4-digit 2012 NAICS by HT Level:'
print oes2012ht_gb.groupby('oes12htlvl').agg([len])['naics']
# list level I industries
print '\nList of Level I HT Industries:'
print list(oes2012ht_gb[oes2012ht_gb.oes12htlvl=='Level I']['naics'])
In [63]:
# flag STEM occupations 2014 OES
oes2014ht = pd.merge(oes2014, stemOcc, on='occ', how='left', indicator=True)
oes2014ht['htocc'] = 0
oes2014ht.loc[oes2014ht._merge=='both','htocc'] = 1
# calculate STEM employment
oes2014ht['htemp'] = oes2014ht.tot_emp * oes2014ht.htocc
# sum emp and STEM emp by industry, calc ratio and average
oes2014ht_gb = oes2014ht[['tot_emp','htemp','naics']].groupby('naics').agg(sum)
oes2014ht_gb['naics']=oes2014ht_gb.index
oes2014ht_gb = oes2014ht_gb.reset_index(drop=True)
oes2014ht_gb['htratio'] = oes2014ht_gb.htemp/oes2014ht_gb.tot_emp
oes2014ht_gb['htratio_mean'] = oes2014ht_gb.htratio.mean()
# flag industry by high tech level
oes2014ht_gb['oes14htlvl'] = ''
oes2014ht_gb.loc[oes2014ht_gb.htratio>=2*oes2014ht_gb.htratio_mean,'oes14htlvl'] = 'Level III'
oes2014ht_gb.loc[oes2014ht_gb.htratio>=3*oes2014ht_gb.htratio_mean,'oes14htlvl'] = 'Level II'
oes2014ht_gb.loc[oes2014ht_gb.htratio>=5*oes2014ht_gb.htratio_mean,'oes14htlvl'] = 'Level I'
# show count of 4-digit industries by level
print '#'*20 +' High Tech Industries - 2014 OES ' + '#'*20
print '\nCount of 4-digit 2012 NAICS by HT Level:'
print oes2014ht_gb.groupby('oes14htlvl').agg([len])['naics']
# list level I industries
print '\nList of Level I HT Industries:'
print list(oes2014ht_gb[oes2014ht_gb.oes14htlvl=='Level I']['naics'])
Combine with list of 4-digit 2002 NAICS industries from Hecker (2005) and export to csv. Note that NAICS industries from Hecker (2005) are 2002 NAICS while the 2012 and 2014 OES data use 2012 NAICS. For comparability, these industries would need to be concorded to the same vintage of industry classification.
In [64]:
hecker05 = pd.read_csv(xwalkPath+'hecker2005_table4.txt')[['naics','level']]
hecker05['naics'] = hecker05['naics'].astype(str)
hecker05.columns = ['naics','hkr05htlvl']
outDF = pd.merge(hecker05,oes2012ht_gb[['naics','oes12htlvl']], on='naics', how='outer')
outDF = pd.merge(outDF, oes2014ht_gb[['naics','oes14htlvl']], on='naics', how='outer')
outDF['oes12htlvl'] = outDF.oes12htlvl.fillna(np.nan)
outDF.loc[outDF.oes12htlvl=='','oes12htlvl'] = np.nan
outDF['oes14htlvl'] = outDF.oes14htlvl.fillna(np.nan)
outDF.loc[outDF.oes14htlvl=='','oes14htlvl'] = np.nan
outDF = outDF.dropna(how='all',subset=['hkr05htlvl','oes12htlvl','oes14htlvl'])
outDF = outDF.sort_values(by='naics')
outDF.to_csv(xwalkPath+'ht_stem_industries.csv', index=False)
print outDF