Introduction

This is an introductory exploratory notebook that we used to investigate the text files, figure out which ones we wanted to use, and figure out which variables from them we wanted to use. It doesn't need to be run to understand the project.

There are some groupby statements that are commented out because they take too long to run.


In [1]:
# Load needed modules and functions
import matplotlib.pyplot as plt
%matplotlib inline

import numpy as np

from pylab import figure, show

import pandas as pd
from pandas import DataFrame, Series

In [2]:
#set up path to the data files
import os
data_folder = os.path.join(os.pardir, "data")

In [3]:
import glob 
file_names = glob.glob(data_folder + "/*")
file_names


Out[3]:
['..\\data\\Abilities.txt',
 '..\\data\\Content Model Reference.txt',
 '..\\data\\DWA Reference.txt',
 '..\\data\\Education, Training, and Experience Categories.txt',
 '..\\data\\Education, Training, and Experience.txt',
 '..\\data\\Green DWA Reference.txt',
 '..\\data\\Green Occupations.txt',
 '..\\data\\Green Task Statements.txt',
 '..\\data\\Interests.txt',
 '..\\data\\IWA Reference.txt',
 '..\\data\\Job Zone Reference.txt',
 '..\\data\\Job Zones.txt',
 '..\\data\\Knowledge.txt',
 '..\\data\\Level Scale Anchors.txt',
 '..\\data\\Occupation Data.txt',
 '..\\data\\Occupation Level Metadata.txt',
 '..\\data\\Read Me.txt',
 '..\\data\\Scales Reference.txt',
 '..\\data\\Skills.txt',
 '..\\data\\Survey Booklet Locations.txt',
 '..\\data\\Task Categories.txt',
 '..\\data\\Task Ratings.txt',
 '..\\data\\Task Statements.txt',
 '..\\data\\Tasks to DWAs.txt',
 '..\\data\\Tasks to Green DWAs.txt',
 '..\\data\\Work Activities.txt',
 '..\\data\\Work Context Categories.txt',
 '..\\data\\Work Context.txt',
 '..\\data\\Work Styles.txt',
 '..\\data\\Work Values.txt']

In [4]:
#you'll probably need to modify p below if you are not on windows
import re
p = re.compile('data\\\(.*).txt')
name_list = []
for name in file_names:
    frame_name = p.findall(name)[0]
    frame_name = frame_name.lower().replace(" ","_")
    frame_name = frame_name.replace(",","")
    name_list.append(frame_name)
    frame = pd.read_table(name, sep= '\t')
    #reformat column names
    columns = frame.columns
    columns = [x.lower().replace("*","").replace("-","_").replace(" ","_") for x in columns]
    frame.columns = columns
    #create a variable named the frame_name
    vars()[frame_name] = frame
    #print file_name
    #name_list.append(p.findall(name)[0])


C:\Users\Tiffany\Anaconda\lib\site-packages\pandas\io\parsers.py:1070: DtypeWarning: Columns (6) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)

In [5]:
#here is a data frame with all of the data frames we now have
name_list


Out[5]:
['abilities',
 'content_model_reference',
 'dwa_reference',
 'education_training_and_experience_categories',
 'education_training_and_experience',
 'green_dwa_reference',
 'green_occupations',
 'green_task_statements',
 'interests',
 'iwa_reference',
 'job_zone_reference',
 'job_zones',
 'knowledge',
 'level_scale_anchors',
 'occupation_data',
 'occupation_level_metadata',
 'read_me',
 'scales_reference',
 'skills',
 'survey_booklet_locations',
 'task_categories',
 'task_ratings',
 'task_statements',
 'tasks_to_dwas',
 'tasks_to_green_dwas',
 'work_activities',
 'work_context_categories',
 'work_context',
 'work_styles',
 'work_values']

In [6]:
#create a dictionary that contains all of the dataframe column names, and the number of times they occur
from collections import Counter
column_names = Counter()
for name in name_list:
    data = vars()[name]
    for column in data.columns:
        column_names[column]+=1
column_names


Out[6]:
Counter({'onet_soc_code': 18, 'element_id': 17, 'date': 16, 'scale_id': 16, 'domain_source': 15, 'element_name': 14, 'data_value': 10, 'n': 9, 'standard_error': 8, 'upper_ci_bound': 8, 'lower_ci_bound': 8, 'recommend_suppress': 8, 'category': 6, 'not_relevant': 5, 'task_id': 5, 'category_description': 3, 'green_dwa_id': 2, 'dwa_id': 2, 'description': 2, 'job_zone': 2, 'task': 2, 'iwa_id': 2, 'scale_name': 1, 'green_dwa_title': 1, 'anchor_value': 1, 'survey_item_number': 1, 'minimum': 1, 'experience': 1, 'education': 1, 'anchor_description': 1, 'title': 1, 'percent': 1, 'response': 1, 'svp_range': 1, 'task_type': 1, 'iwa_title': 1, 'job_training': 1, 'name': 1, 'dwa_title': 1, 'green_task_type': 1, 'maximum': 1, 'green_occupational_category': 1, 'item': 1, 'incumbents_responding': 1, 'onet_18.1_database': 1, 'examples': 1})

Functions

In this section, we define some functions to use to do some basic analysis on what our data sets of interest contain

In [7]:
#function that calculates the number of features available in a dataframe (the # rows divided by # of jobs)
def feature(dataframe):
    return len(dataframe)/len(dataframe.onet_soc_code.unique())

In [8]:
#function that gets unique values of a dataframe column and merges it with another data frame
def getDescriptions(data, metadata, column_name):
    uniques = pd.DataFrame(data[column_name].unique())
    uniques.columns = [column_name]
    return pd.merge(uniques,metadata,on=column_name)

In [9]:
#function to calculate the percentage of rows in an onet data table are relevant to the given job
def getRelevance(dataframe):
    relevant_rows = dataframe[dataframe['not_relevant']!= 'Y']
    relevance = float(len(relevant_rows))/float(len(dataframe))
    return relevance*100

In [10]:
#function to calculate how many rows are recommended for exclusion in an onet data table
def getExclusions(dataframe):
    excluded_rows = dataframe[dataframe['recommend_suppress'] == 'Y']
    exclusions = float(len(excluded_rows))/float(len(dataframe))
    return exclusions * 100

Analysis

In this section, we examine the domain-level dataframes. These are

* abilities
* education_training
* knowledge
* interests
* job zones
* skills
* work activites
* work context
* work styles
* work values

Abilities


In [11]:
abilities.head()


Out[11]:
onet_soc_code element_id element_name scale_id data_value n standard_error lower_ci_bound upper_ci_bound recommend_suppress not_relevant date domain_source
0 11-1011.00 1.A.1.a.1 Oral Comprehension IM 4.50 8 0.19 4.13 4.87 N n/a 06/2006 Analyst
1 11-1011.00 1.A.1.a.1 Oral Comprehension LV 4.75 8 0.25 4.26 5.24 N N 06/2006 Analyst
2 11-1011.00 1.A.1.a.2 Written Comprehension IM 4.38 8 0.18 4.02 4.73 N n/a 06/2006 Analyst
3 11-1011.00 1.A.1.a.2 Written Comprehension LV 4.63 8 0.32 3.99 5.26 N N 06/2006 Analyst
4 11-1011.00 1.A.1.a.3 Oral Expression IM 4.50 8 0.19 4.13 4.87 N n/a 06/2006 Analyst

5 rows × 13 columns


In [12]:
# abilities.groupby(['onet_soc_code','element_id','element_name','scale_id']).apply(sum)

In [13]:
#what are the unique element names?
abilities.element_id.unique()


Out[13]:
array(['1.A.1.a.1', '1.A.1.a.2', '1.A.1.a.3', '1.A.1.a.4', '1.A.1.b.1',
       '1.A.1.b.2', '1.A.1.b.3', '1.A.1.b.4', '1.A.1.b.5', '1.A.1.b.6',
       '1.A.1.b.7', '1.A.1.c.1', '1.A.1.c.2', '1.A.1.d.1', '1.A.1.e.1',
       '1.A.1.e.2', '1.A.1.e.3', '1.A.1.f.1', '1.A.1.f.2', '1.A.1.g.1',
       '1.A.1.g.2', '1.A.2.a.1', '1.A.2.a.2', '1.A.2.a.3', '1.A.2.b.1',
       '1.A.2.b.2', '1.A.2.b.3', '1.A.2.b.4', '1.A.2.c.1', '1.A.2.c.2',
       '1.A.2.c.3', '1.A.3.a.1', '1.A.3.a.2', '1.A.3.a.3', '1.A.3.a.4',
       '1.A.3.b.1', '1.A.3.c.1', '1.A.3.c.2', '1.A.3.c.3', '1.A.3.c.4',
       '1.A.4.a.1', '1.A.4.a.2', '1.A.4.a.3', '1.A.4.a.4', '1.A.4.a.5',
       '1.A.4.a.6', '1.A.4.a.7', '1.A.4.b.1', '1.A.4.b.2', '1.A.4.b.3',
       '1.A.4.b.4', '1.A.4.b.5'], dtype=object)

In [14]:
#how many different ability element names are there? 
len(abilities.element_name.unique())


Out[14]:
52

In [15]:
#what are the scales of each ability?
getDescriptions(abilities,scales_reference,"scale_id")


Out[15]:
scale_id scale_name minimum maximum
0 IM Importance 1 5
1 LV Level 0 7

2 rows × 4 columns


In [16]:
getDescriptions(abilities, content_model_reference, "element_name")


Out[16]:
element_name element_id description
0 Oral Comprehension 1.A.1.a.1 The ability to listen to and understand inform...
1 Written Comprehension 1.A.1.a.2 The ability to read and understand information...
2 Oral Expression 1.A.1.a.3 The ability to communicate information and ide...
3 Written Expression 1.A.1.a.4 The ability to communicate information and ide...
4 Fluency of Ideas 1.A.1.b.1 The ability to come up with a number of ideas ...
5 Originality 1.A.1.b.2 The ability to come up with unusual or clever ...
6 Problem Sensitivity 1.A.1.b.3 The ability to tell when something is wrong or...
7 Deductive Reasoning 1.A.1.b.4 The ability to apply general rules to specific...
8 Inductive Reasoning 1.A.1.b.5 The ability to combine pieces of information t...
9 Information Ordering 1.A.1.b.6 The ability to arrange things or actions in a ...
10 Category Flexibility 1.A.1.b.7 The ability to generate or use different sets ...
11 Mathematical Reasoning 1.A.1.c.1 The ability to choose the right mathematical m...
12 Number Facility 1.A.1.c.2 The ability to add, subtract, multiply, or div...
13 Memorization 1.A.1.d.1 The ability to remember information such as wo...
14 Speed of Closure 1.A.1.e.1 The ability to quickly make sense of, combine,...
15 Flexibility of Closure 1.A.1.e.2 The ability to identify or detect a known patt...
16 Perceptual Speed 1.A.1.e.3 The ability to quickly and accurately compare ...
17 Spatial Orientation 1.A.1.f.1 The ability to know your location in relation ...
18 Visualization 1.A.1.f.2 The ability to imagine how something will look...
19 Selective Attention 1.A.1.g.1 The ability to concentrate on a task over a pe...
20 Time Sharing 1.A.1.g.2 The ability to shift back and forth between tw...
21 Arm-Hand Steadiness 1.A.2.a.1 The ability to keep your hand and arm steady w...
22 Manual Dexterity 1.A.2.a.2 The ability to quickly move your hand, your ha...
23 Finger Dexterity 1.A.2.a.3 The ability to make precisely coordinated move...
24 Control Precision 1.A.2.b.1 The ability to quickly and repeatedly adjust t...
25 Multilimb Coordination 1.A.2.b.2 The ability to coordinate two or more limbs (f...
26 Response Orientation 1.A.2.b.3 The ability to choose quickly between two or m...
27 Rate Control 1.A.2.b.4 The ability to time your movements or the move...
28 Reaction Time 1.A.2.c.1 The ability to quickly respond (with the hand,...
29 Wrist-Finger Speed 1.A.2.c.2 The ability to make fast, simple, repeated mov...
30 Speed of Limb Movement 1.A.2.c.3 The ability to quickly move the arms and legs.
31 Static Strength 1.A.3.a.1 The ability to exert maximum muscle force to l...
32 Explosive Strength 1.A.3.a.2 The ability to use short bursts of muscle forc...
33 Dynamic Strength 1.A.3.a.3 The ability to exert muscle force repeatedly o...
34 Trunk Strength 1.A.3.a.4 The ability to use your abdominal and lower ba...
35 Stamina 1.A.3.b.1 The ability to exert yourself physically over ...
36 Extent Flexibility 1.A.3.c.1 The ability to bend, stretch, twist, or reach ...
37 Dynamic Flexibility 1.A.3.c.2 The ability to quickly and repeatedly bend, st...
38 Gross Body Coordination 1.A.3.c.3 The ability to coordinate the movement of your...
39 Gross Body Equilibrium 1.A.3.c.4 The ability to keep or regain your body balanc...
40 Near Vision 1.A.4.a.1 The ability to see details at close range (wit...
41 Far Vision 1.A.4.a.2 The ability to see details at a distance.
42 Visual Color Discrimination 1.A.4.a.3 The ability to match or detect differences bet...
43 Night Vision 1.A.4.a.4 The ability to see under low light conditions.
44 Peripheral Vision 1.A.4.a.5 The ability to see objects or movement of obje...
45 Depth Perception 1.A.4.a.6 The ability to judge which of several objects ...
46 Glare Sensitivity 1.A.4.a.7 The ability to see objects in the presence of ...
47 Hearing Sensitivity 1.A.4.b.1 The ability to detect or tell the differences ...
48 Auditory Attention 1.A.4.b.2 The ability to focus on a single source of sou...
49 Sound Localization 1.A.4.b.3 The ability to tell the direction from which a...
50 Speech Recognition 1.A.4.b.4 The ability to identify and understand the spe...
51 Speech Clarity 1.A.4.b.5 The ability to speak clearly so others can und...

52 rows × 3 columns


In [17]:
#how many abilities features are there?
feature(abilities)


Out[17]:
104

In [18]:
#percentage of relevant ability rows?
getRelevance(abilities)


Out[18]:
90.33461121760146

In [19]:
#percentage of rows to be excluded
getExclusions(abilities)


Out[19]:
0.5042086840570048

In [20]:
job_elements = abilities[['onet_soc_code','element_id']][abilities.recommend_suppress == 'Y']

frame = pd.merge(job_elements, abilities, on=['onet_soc_code','element_id'])
frame = frame[frame.scale_id == 'IM']
frame.data_value.max()
frame[frame.data_value == 4.25]


Out[20]:
onet_soc_code element_id element_name scale_id data_value n standard_error lower_ci_bound upper_ci_bound recommend_suppress not_relevant date domain_source
268 27-2031.00 1.A.3.a.3 Dynamic Strength IM 4.25 8 0.31 3.64 4.86 N n/a 12/2005 Analyst

1 rows × 13 columns

Education, training, and experience


In [21]:
#domain data set #2- what doe the education, training, and experience data look like
#it has onet_soc_code-element_id/_name-scale_id-category
# education_training_and_experience.groupby(['onet_soc_code','element_id','element_name','scale_id',"category"]).apply(sum)

In [22]:
#what are the unique element_names and what do they mean?
getDescriptions(education_training_and_experience,content_model_reference,"element_name")


Out[22]:
element_name element_id description
0 Required Level of Education 2.D.1 The level of education required to perform a job.
1 Related Work Experience 3.A.1 Amount of related work experience required to ...
2 On-Site or In-Plant Training 3.A.2 Amount of on-site or in-plant training (e.g., ...
3 On-the-Job Training 3.A.3 Amount of on the job training required to perf...
4 On-the-Job Training 3.D.2.c Obtaining the licenses, certificates, or regis...

5 rows × 3 columns


In [23]:
#what are the unique scales in the education training data and what do they mean?
getDescriptions(education_training_and_experience, scales_reference, "scale_id")
#looks like there is a one-to-one relationship between the element names and the


Out[23]:
scale_id scale_name minimum maximum
0 RL Required Level Of Education (Categories 1-12) 0 100
1 RW Related Work Experience (Categories 1-11) 0 100
2 PT On-Site Or In-Plant Training (Categories 1-9) 0 100
3 OJ On-The-Job Training (Categories 1-9) 0 100

4 rows × 4 columns


In [24]:
#what are the different categories in the education training data and what do they mean?
getDescriptions(education_training_and_experience, education_training_and_experience_categories, "category")
#meaning of category is dependent on the scale_id/element-name/element-id


Out[24]:
category element_id element_name scale_id category_description
0 1 2.D.1 Required Level of Education RL Less than a High School Diploma
1 1 3.A.1 Related Work Experience RW None
2 1 3.A.2 On-Site or In-Plant Training PT None
3 1 3.A.3 On-the-Job Training OJ None or short demonstration
4 2 2.D.1 Required Level of Education RL High School Diploma (or GED or High School Equ...
5 2 3.A.1 Related Work Experience RW Up to and including 1 month
6 2 3.A.2 On-Site or In-Plant Training PT Up to and including 1 month
7 2 3.A.3 On-the-Job Training OJ Anything beyond short demonstration, up to and...
8 3 2.D.1 Required Level of Education RL Post-Secondary Certificate - awarded for train...
9 3 3.A.1 Related Work Experience RW Over 1 month, up to and including 3 months
10 3 3.A.2 On-Site or In-Plant Training PT Over 1 month, up to and including 3 months
11 3 3.A.3 On-the-Job Training OJ Over 1 month, up to and including 3 months
12 4 2.D.1 Required Level of Education RL Some College Courses
13 4 3.A.1 Related Work Experience RW Over 3 months, up to and including 6 months
14 4 3.A.2 On-Site or In-Plant Training PT Over 3 months, up to and including 6 months
15 4 3.A.3 On-the-Job Training OJ Over 3 months, up to and including 6 months
16 5 2.D.1 Required Level of Education RL Associate's Degree (or other 2-year degree)
17 5 3.A.1 Related Work Experience RW Over 6 months, up to and including 1 year
18 5 3.A.2 On-Site or In-Plant Training PT Over 6 months, up to and including 1 year
19 5 3.A.3 On-the-Job Training OJ Over 6 months, up to and including 1 year
20 6 2.D.1 Required Level of Education RL Bachelor's Degree
21 6 3.A.1 Related Work Experience RW Over 1 year, up to and including 2 years
22 6 3.A.2 On-Site or In-Plant Training PT Over 1 year, up to and including 2 years
23 6 3.A.3 On-the-Job Training OJ Over 1 year, up to and including 2 years
24 7 2.D.1 Required Level of Education RL Post-Baccalaureate Certificate - awarded for c...
25 7 3.A.1 Related Work Experience RW Over 2 years, up to and including 4 years
26 7 3.A.2 On-Site or In-Plant Training PT Over 2 years, up to and including 4 years
27 7 3.A.3 On-the-Job Training OJ Over 2 years, up to and including 4 years
28 8 2.D.1 Required Level of Education RL Master's Degree
29 8 3.A.1 Related Work Experience RW Over 4 years, up to and including 6 years
30 8 3.A.2 On-Site or In-Plant Training PT Over 4 years, up to and including 10 years
31 8 3.A.3 On-the-Job Training OJ Over 4 years, up to and including 10 years
32 9 2.D.1 Required Level of Education RL Post-Master's Certificate - awarded for comple...
33 9 3.A.1 Related Work Experience RW Over 6 years, up to and including 8 years
34 9 3.A.2 On-Site or In-Plant Training PT Over 10 years
35 9 3.A.3 On-the-Job Training OJ Over 10 years
36 10 2.D.1 Required Level of Education RL First Professional Degree - awarded for comple...
37 10 3.A.1 Related Work Experience RW Over 8 years, up to and including 10 years
38 11 2.D.1 Required Level of Education RL Doctoral Degree
39 11 3.A.1 Related Work Experience RW Over 10 years
40 12 2.D.1 Required Level of Education RL Post-Doctoral Training

41 rows × 5 columns


In [25]:
#how many education and training features are there
feature(education_training_and_experience)
#len(education_training_and_experience)/len(education_training_and_experience.onet_soc_code.unique())


Out[25]:
41

In [26]:
#what percentage of rows are relevant to the job?
#getRelevance(education_training_and_experience)
#this throws an error because there is no relevance column- everything is relevant

In [27]:
#percentage recommended suppressed
getExclusions(education_training_and_experience)


Out[27]:
1.9639587037872557

Knowledge


In [28]:
#what does it look like?
knowledge.head()


Out[28]:
onet_soc_code element_id element_name scale_id data_value n standard_error lower_ci_bound upper_ci_bound recommend_suppress not_relevant date domain_source
0 11-1011.00 2.C.1.a Administration and Management IM 4.45 30 0.20 4.04 4.86 N n/a 06/2006 Incumbent
1 11-1011.00 2.C.1.a Administration and Management LV 6.25 30 0.24 5.75 6.75 N N 06/2006 Incumbent
2 11-1011.00 2.C.1.b Clerical IM 2.46 30 0.28 1.89 3.04 N n/a 06/2006 Incumbent
3 11-1011.00 2.C.1.b Clerical LV 3.50 30 0.42 2.65 4.35 N N 06/2006 Incumbent
4 11-1011.00 2.C.1.c Economics and Accounting IM 4.00 30 0.24 3.51 4.49 N n/a 06/2006 Incumbent

5 rows × 13 columns


In [29]:
knowledge.element_name.unique()


Out[29]:
array(['Administration and Management', 'Clerical',
       'Economics and Accounting', 'Sales and Marketing',
       'Customer and Personal Service', 'Personnel and Human Resources',
       'Production and Processing', 'Food Production',
       'Computers and Electronics', 'Engineering and Technology', 'Design',
       'Building and Construction', 'Mechanical', 'Mathematics', 'Physics',
       'Chemistry', 'Biology', 'Psychology', 'Sociology and Anthropology',
       'Geography', 'Medicine and Dentistry', 'Therapy and Counseling',
       'Education and Training', 'English Language', 'Foreign Language',
       'Fine Arts', 'History and Archeology', 'Philosophy and Theology',
       'Public Safety and Security', 'Law and Government',
       'Telecommunications', 'Communications and Media', 'Transportation'], dtype=object)

In [30]:
#what does it look like grouped by the factors?
# knowledge.groupby(['onet_soc_code','element_id','element_name','scale_id']).apply(sum)
#like abilities, it's grouped by onet_soc_code-element_id/name-scale_id

In [31]:
#what are the unique element_names and what do they mean?
getDescriptions(knowledge, content_model_reference, "element_name")


Out[31]:
element_name element_id description
0 Administration and Management 2.C.1.a Knowledge of business and management principle...
1 Clerical 2.C.1.b Knowledge of administrative and clerical proce...
2 Economics and Accounting 2.C.1.c Knowledge of economic and accounting principle...
3 Sales and Marketing 2.C.1.d Knowledge of principles and methods for showin...
4 Customer and Personal Service 2.C.1.e Knowledge of principles and processes for prov...
5 Personnel and Human Resources 2.C.1.f Knowledge of principles and procedures for per...
6 Production and Processing 2.C.2.a Knowledge of raw materials, production process...
7 Food Production 2.C.2.b Knowledge of techniques and equipment for plan...
8 Computers and Electronics 2.C.3.a Knowledge of circuit boards, processors, chips...
9 Engineering and Technology 2.C.3 Knowledge of the design, development, and appl...
10 Engineering and Technology 2.C.3.b Knowledge of the practical application of engi...
11 Design 2.C.3.c Knowledge of design techniques, tools, and pri...
12 Building and Construction 2.C.3.d Knowledge of materials, methods, and the tools...
13 Mechanical 2.C.3.e Knowledge of machines and tools, including the...
14 Mathematics 2.A.1.e Using mathematics to solve problems.
15 Mathematics 2.C.4.a Knowledge of arithmetic, algebra, geometry, ca...
16 Physics 2.C.4.b Knowledge and prediction of physical principle...
17 Chemistry 2.C.4.c Knowledge of the chemical composition, structu...
18 Biology 2.C.4.d Knowledge of plant and animal organisms, their...
19 Psychology 2.C.4.e Knowledge of human behavior and performance; i...
20 Sociology and Anthropology 2.C.4.f Knowledge of group behavior and dynamics, soci...
21 Geography 2.C.4.g Knowledge of principles and methods for descri...
22 Medicine and Dentistry 2.C.5.a Knowledge of the information and techniques ne...
23 Therapy and Counseling 2.C.5.b Knowledge of principles, methods, and procedur...
24 Education and Training 2.C.6 Knowledge of principles and methods for curric...
25 English Language 2.C.7.a Knowledge of the structure and content of the ...
26 Foreign Language 2.C.7.b Knowledge of the structure and content of a fo...
27 Fine Arts 2.C.7.c Knowledge of the theory and techniques require...
28 History and Archeology 2.C.7.d Knowledge of historical events and their cause...
29 Philosophy and Theology 2.C.7.e Knowledge of different philosophical systems a...
30 Public Safety and Security 2.C.8.a Knowledge of relevant equipment, policies, pro...
31 Law and Government 2.C.8.b Knowledge of laws, legal codes, court procedur...
32 Telecommunications 2.C.9.a Knowledge of transmission, broadcasting, switc...
33 Communications and Media 2.C.9.b Knowledge of media production, communication, ...
34 Transportation 2.C.10 Knowledge of principles and methods for moving...

35 rows × 3 columns


In [32]:
#what are the different knowledge scales and what do they mean?
getDescriptions(knowledge, scales_reference, "scale_id")
#these are the same as for ability


Out[32]:
scale_id scale_name minimum maximum
0 IM Importance 1 5
1 LV Level 0 7

2 rows × 4 columns


In [33]:
#how many different knowledge features are there?
feature(knowledge)


Out[33]:
66

In [34]:
#percentage relevant
getRelevance(knowledge)


Out[34]:
89.09189402147149

In [35]:
#percent to be excluded
getExclusions(knowledge)


Out[35]:
8.582028300338159

In [36]:
#job_elements = knowledge[['onet_soc_code','element_id']][abilities.recommend_suppress == 'Y
test = knowledge[knowledge.recommend_suppress == 'Y']
test[test.scale_id == "IM"]


# frame = pd.merge(job_elements, abilities, on=['onet_soc_code','element_id'])
# frame = frame[frame.scale_id == 'IM']
# frame.data_value.max()
# frame[frame.data_value == 4.25]


Out[36]:
onet_soc_code element_id element_name scale_id data_value n standard_error lower_ci_bound upper_ci_bound recommend_suppress not_relevant date domain_source
38528 43-4021.00 2.C.7.c Fine Arts IM 1 14 0.00 n/a n/a Y n/a 12/2006 Incumbent
38530 43-4021.00 2.C.7.d History and Archeology IM 1 13 0.00 n/a n/a Y n/a 12/2006 Incumbent

2 rows × 13 columns

Interests


In [37]:
#what does it look like?
interests.head()


Out[37]:
onet_soc_code element_id element_name scale_id data_value date domain_source
0 11-1011.00 1.B.1.a Realistic OI 1.33 06/2008 Analyst
1 11-1011.00 1.B.1.b Investigative OI 2.00 06/2008 Analyst
2 11-1011.00 1.B.1.c Artistic OI 2.67 06/2008 Analyst
3 11-1011.00 1.B.1.d Social OI 3.67 06/2008 Analyst
4 11-1011.00 1.B.1.e Enterprising OI 7.00 06/2008 Analyst

5 rows × 7 columns


In [38]:
#what does it look like grouped by the factors?
# interests.groupby(['onet_soc_code','element_id','element_name','scale_id']).apply(sum)
#looks like one-to-one matching between element_name and scale_id

In [39]:
#what do these element names mean?
getDescriptions(interests, content_model_reference, "element_name")


Out[39]:
element_name element_id description
0 Realistic 1.B.1.a Realistic occupations frequently involve work ...
1 Investigative 1.B.1.b Investigative occupations frequently involve w...
2 Artistic 1.B.1.c Artistic occupations frequently involve workin...
3 Social 1.B.1.d Social occupations frequently involve working ...
4 Enterprising 1.B.1.e Enterprising occupations frequently involve st...
5 Conventional 1.B.1.f Conventional occupations frequently involve fo...
6 First Interest High-Point 1.B.1.g Primary-Rank Descriptiveness
7 Second Interest High-Point 1.B.1.h Secondary-Cutoff/Rank Descriptiveness
8 Third Interest High-Point 1.B.1.i Tertiary-Cutoff/Rank Descriptiveness

9 rows × 3 columns


In [40]:
#what do the scale_ids mean?
getDescriptions(interests, scales_reference, "scale_id")


Out[40]:
scale_id scale_name minimum maximum
0 OI Occupational Interests 1 7
1 IH Occupational Interest High-Point 0 6

2 rows × 4 columns


In [41]:
#how many total interests features are there?
feature(interests)


Out[41]:
9

Job Zones


In [42]:
#What do the job zones look like? 
job_zones.head()
#there's a one-to-one relationship between jobs and job_zone, so we don't need to group_by


Out[42]:
onet_soc_code job_zone date domain_source
0 11-1011.00 5 06/2006 Analyst
1 11-1011.03 5 07/2013 Analyst
2 11-1021.00 3 06/2008 Analyst
3 11-1031.00 4 06/2008 Analyst
4 11-2011.00 4 06/2010 Analyst

5 rows × 4 columns


In [43]:
#what is a job zone?
getDescriptions(job_zones, job_zone_reference, "job_zone")
#these seem to be very closely related, simplified version of the education training information


Out[43]:
job_zone name experience education job_training examples svp_range
0 5 Job Zone Five: Extensive Preparation Needed Extensive skill, knowledge, and experience are... Most of these occupations require graduate sch... Employees may need some on-the-job training, b... These occupations often involve coordinating, ... (8.0 and above)
1 3 Job Zone Three: Medium Preparation Needed Previous work-related skill, knowledge, or exp... Most occupations in this zone require training... Employees in these occupations usually need on... These occupations usually involve using commun... (6.0 to < 7.0)
2 4 Job Zone Four: Considerable Preparation Needed A considerable amount of work-related skill, k... Most of these occupations require a four-year ... Employees in these occupations usually need se... Many of these occupations involve coordinating... (7.0 to < 8.0)
3 2 Job Zone Two: Some Preparation Needed Some previous work-related skill, knowledge, o... These occupations usually require a high schoo... Employees in these occupations need anywhere f... These occupations often involve using your kno... (4.0 to < 6.0)
4 1 Job Zone One: Little or No Preparation Needed Little or no previous work-related skill, know... Some of these occupations may require a high s... Employees in these occupations need anywhere f... These occupations involve following instructio... (Below 4.0)

5 rows × 7 columns


In [44]:
#how many features are in the job zone data
feature(job_zones)


Out[44]:
1
Conclusions of job_zones: * we can consider using it in lieu of the education training dataframe, which is much more detailed

Skills


In [45]:
#what do the skills look like?
skills.head()


Out[45]:
onet_soc_code element_id element_name scale_id data_value n standard_error lower_ci_bound upper_ci_bound recommend_suppress not_relevant date domain_source
0 11-1011.00 2.A.1.a Reading Comprehension IM 4.38 8 0.18 4.02 4.73 N n/a 06/2010 Analyst
1 11-1011.00 2.A.1.a Reading Comprehension LV 4.75 8 0.25 4.26 5.24 N N 06/2010 Analyst
2 11-1011.00 2.A.1.b Active Listening IM 4.38 8 0.18 4.02 4.73 N n/a 06/2010 Analyst
3 11-1011.00 2.A.1.b Active Listening LV 4.88 8 0.35 4.19 5.56 N N 06/2010 Analyst
4 11-1011.00 2.A.1.c Writing IM 4.12 8 0.23 3.68 4.57 N n/a 06/2010 Analyst

5 rows × 13 columns


In [46]:
#what do the skills look like grouped by factor
# skills.groupby(['onet_soc_code','element_id','element_name','scale_id']).apply(sum)

In [47]:
#what are the different element names?
getDescriptions(skills, content_model_reference, "element_name")


Out[47]:
element_name element_id description
0 Reading Comprehension 2.A.1.a Understanding written sentences and paragraphs...
1 Active Listening 2.A.1.b Giving full attention to what other people are...
2 Writing 2.A.1.c Communicating effectively in writing as approp...
3 Speaking 2.A.1.d Talking to others to convey information effect...
4 Mathematics 2.A.1.e Using mathematics to solve problems.
5 Mathematics 2.C.4.a Knowledge of arithmetic, algebra, geometry, ca...
6 Science 2.A.1.f Using scientific rules and methods to solve pr...
7 Critical Thinking 2.A.2.a Using logic and reasoning to identify the stre...
8 Active Learning 2.A.2.b Understanding the implications of new informat...
9 Learning Strategies 2.A.2.c Selecting and using training/instructional met...
10 Monitoring 2.A.2.d Monitoring/Assessing performance of yourself, ...
11 Social Perceptiveness 2.B.1.a Being aware of others' reactions and understan...
12 Coordination 2.B.1.b Adjusting actions in relation to others' actions.
13 Persuasion 2.B.1.c Persuading others to change their minds or beh...
14 Negotiation 2.B.1.d Bringing others together and trying to reconci...
15 Instructing 2.B.1.e Teaching others how to do something.
16 Service Orientation 2.B.1.f Actively looking for ways to help people.
17 Complex Problem Solving 2.B.2.i Identifying complex problems and reviewing rel...
18 Operations Analysis 2.B.3.a Analyzing needs and product requirements to cr...
19 Technology Design 2.B.3.b Generating or adapting equipment and technolog...
20 Equipment Selection 2.B.3.c Determining the kind of tools and equipment ne...
21 Installation 2.B.3.d Installing equipment, machines, wiring, or pro...
22 Programming 2.B.3.e Writing computer programs for various purposes.
23 Operation Monitoring 2.B.3.g Watching gauges, dials, or other indicators to...
24 Operation and Control 2.B.3.h Controlling operations of equipment or systems.
25 Equipment Maintenance 2.B.3.j Performing routine maintenance on equipment an...
26 Troubleshooting 2.B.3.k Determining causes of operating errors and dec...
27 Repairing 2.B.3.l Repairing machines or systems using the needed...
28 Quality Control Analysis 2.B.3.m Conducting tests and inspections of products, ...
29 Judgment and Decision Making 2.B.4.e Considering the relative costs and benefits of...
30 Systems Analysis 2.B.4.g Determining how a system should work and how c...
31 Systems Evaluation 2.B.4.h Identifying measures or indicators of system p...
32 Time Management 2.B.5.a Managing one's own time and the time of others.
33 Management of Financial Resources 2.B.5.b Determining how money will be spent to get the...
34 Management of Material Resources 2.B.5.c Obtaining and seeing to the appropriate use of...
35 Management of Personnel Resources 2.B.5.d Motivating, developing, and directing people a...

36 rows × 3 columns


In [48]:
#What do the different scales mean?
getDescriptions(skills, scales_reference, "scale_id")
#they are the same skills as in abilities


Out[48]:
scale_id scale_name minimum maximum
0 IM Importance 1 5
1 LV Level 0 7

2 rows × 4 columns


In [49]:
#how many skills features are there?
feature(skills)


Out[49]:
70

In [50]:
#what fraction of the skill combinations are relevant to the job
getRelevance(skills)


Out[50]:
93.56136820925553

In [51]:
#what percentage of skill combinations are recommended to be excluded?
getExclusions(skills)


Out[51]:
1.3449930351338801

In [52]:
test = skills[skills.recommend_suppress == 'Y']
test[test.scale_id == "IM"]


Out[52]:
onet_soc_code element_id element_name scale_id data_value n standard_error lower_ci_bound upper_ci_bound recommend_suppress not_relevant date domain_source
13774 19-1032.00 2.B.3.m Quality Control Analysis IM 2.62 8 0.53 1.58 3.67 Y n/a 06/2010 Analyst
40654 43-3061.00 2.B.3.m Quality Control Analysis IM 1.88 8 0.52 1.00 2.89 Y n/a 06/2010 Analyst

2 rows × 13 columns

Work Activities


In [53]:
#what does it look like?
work_activities.head()


Out[53]:
onet_soc_code element_id element_name scale_id data_value n standard_error lower_ci_bound upper_ci_bound recommend_suppress not_relevant date domain_source
0 11-1011.00 4.A.1.a.1 Getting Information IM 4.75 24 0.15 4.44 5.00 N n/a 06/2006 Incumbent
1 11-1011.00 4.A.1.a.1 Getting Information LV 5.03 24 0.15 4.73 5.33 N N 06/2006 Incumbent
2 11-1011.00 4.A.1.a.2 Monitor Processes, Materials, or Surroundings IM 3.18 24 0.57 2.01 4.36 N n/a 06/2006 Incumbent
3 11-1011.00 4.A.1.a.2 Monitor Processes, Materials, or Surroundings LV 3.57 24 0.95 1.61 5.52 N N 06/2006 Incumbent
4 11-1011.00 4.A.1.b.1 Identifying Objects, Actions, and Events IM 3.64 24 0.40 2.81 4.48 N n/a 06/2006 Incumbent

5 rows × 13 columns


In [54]:
#grouped by the factors
# work_activities.groupby(['onet_soc_code','element_id','element_name','scale_id']).apply(sum)
#grouped same as most of the other data frames- same scale_id

In [55]:
#what do each of the elements mean?
getDescriptions(work_activities, content_model_reference, "element_name")


Out[55]:
element_name element_id description
0 Getting Information 4.A.1.a.1 Observing, receiving, and otherwise obtaining ...
1 Monitor Processes, Materials, or Surroundings 4.A.1.a.2 Monitoring and reviewing information from mate...
2 Identifying Objects, Actions, and Events 4.A.1.b.1 Identifying information by categorizing, estim...
3 Inspecting Equipment, Structures, or Material 4.A.1.b.2 Inspecting equipment, structures, or materials...
4 Estimating the Quantifiable Characteristics of... 4.A.1.b.3 Estimating sizes, distances, and quantities; o...
5 Judging the Qualities of Things, Services, or ... 4.A.2.a.1 Assessing the value, importance, or quality of...
6 Processing Information 4.A.2.a.2 Compiling, coding, categorizing, calculating, ...
7 Evaluating Information to Determine Compliance... 4.A.2.a.3 Using relevant information and individual judg...
8 Analyzing Data or Information 4.A.2.a.4 Identifying the underlying principles, reasons...
9 Making Decisions and Solving Problems 4.A.2.b.1 Analyzing information and evaluating results t...
10 Thinking Creatively 4.A.2.b.2 Developing, designing, or creating new applica...
11 Updating and Using Relevant Knowledge 4.A.2.b.3 Keeping up-to-date technically and applying ne...
12 Developing Objectives and Strategies 4.A.2.b.4 Establishing long-range objectives and specify...
13 Scheduling Work and Activities 4.A.2.b.5 Scheduling events, programs, and activities, a...
14 Organizing, Planning, and Prioritizing Work 4.A.2.b.6 Developing specific goals and plans to priorit...
15 Performing General Physical Activities 4.A.3.a.1 Performing physical activities that require co...
16 Handling and Moving Objects 4.A.3.a.2 Using hands and arms in handling, installing, ...
17 Controlling Machines and Processes 4.A.3.a.3 Using either control mechanisms or direct phys...
18 Operating Vehicles, Mechanized Devices, or Equ... 4.A.3.a.4 Running, maneuvering, navigating, or driving v...
19 Interacting With Computers 4.A.3.b.1 Using computers and computer systems (includin...
20 Drafting, Laying Out, and Specifying Technical... 4.A.3.b.2 Providing documentation, detailed instructions...
21 Repairing and Maintaining Mechanical Equipment 4.A.3.b.4 Servicing, repairing, adjusting, and testing m...
22 Repairing and Maintaining Electronic Equipment 4.A.3.b.5 Servicing, repairing, calibrating, regulating,...
23 Documenting/Recording Information 4.A.3.b.6 Entering, transcribing, recording, storing, or...
24 Interpreting the Meaning of Information for Ot... 4.A.4.a.1 Translating or explaining what information mea...
25 Communicating with Supervisors, Peers, or Subo... 4.A.4.a.2 Providing information to supervisors, co-worke...
26 Communicating with Persons Outside Organization 4.A.4.a.3 Communicating with people outside the organiza...
27 Establishing and Maintaining Interpersonal Rel... 4.A.4.a.4 Developing constructive and cooperative workin...
28 Assisting and Caring for Others 4.A.4.a.5 Providing personal assistance, medical attenti...
29 Selling or Influencing Others 4.A.4.a.6 Convincing others to buy merchandise/goods or ...
30 Resolving Conflicts and Negotiating with Others 4.A.4.a.7 Handling complaints, settling disputes, and re...
31 Performing for or Working Directly with the Pu... 4.A.4.a.8 Performing for people or dealing directly with...
32 Coordinating the Work and Activities of Others 4.A.4.b.1 Getting members of a group to work together to...
33 Developing and Building Teams 4.A.4.b.2 Encouraging and building mutual trust, respect...
34 Training and Teaching Others 4.A.4.b.3 Identifying the educational needs of others, d...
35 Guiding, Directing, and Motivating Subordinates 4.A.4.b.4 Providing guidance and direction to subordinat...
36 Coaching and Developing Others 4.A.4.b.5 Identifying the developmental needs of others ...
37 Provide Consultation and Advice to Others 4.A.4.b.6 Providing guidance and expert advice to manage...
38 Performing Administrative Activities 4.A.4.c.1 Performing day-to-day administrative tasks suc...
39 Staffing Organizational Units 4.A.4.c.2 Recruiting, interviewing, selecting, hiring, a...
40 Monitoring and Controlling Resources 4.A.4.c.3 Monitoring and controlling resources and overs...

41 rows × 3 columns


In [56]:
#don't need to do scale- it's the same as abilities and a bunch of other data frames- importance and level

In [57]:
#how many total features are there?
feature(work_activities)


Out[57]:
82

In [58]:
#percentage of rows that are relevant
getRelevance(work_activities)


Out[58]:
98.18856855957509

In [59]:
#percentage of rows that should be excluded
getExclusions(work_activities)


Out[59]:
1.77311523927807

In [60]:
test = work_activities[work_activities.recommend_suppress == 'Y']
test[test.scale_id == 'IM']


Out[60]:
Int64Index([], dtype='int64') Empty DataFrame

0 rows × 13 columns

Work Context


In [61]:
work_context.head()


Out[61]:
onet_soc_code element_id element_name scale_id category data_value n standard_error lower_ci_bound upper_ci_bound recommend_suppress not_relevant date domain_source
0 11-1011.00 4.C.1.a.2.c Public Speaking CX n/a 3.47 27 0.55 2.33 4.61 N n/a 06/2006 Incumbent
1 11-1011.00 4.C.1.a.2.c Public Speaking CXP 1 14.55 27 13.42 1.82 61.04 N n/a 06/2006 Incumbent
2 11-1011.00 4.C.1.a.2.c Public Speaking CXP 2 2.39 27 1.82 0.49 10.86 N n/a 06/2006 Incumbent
3 11-1011.00 4.C.1.a.2.c Public Speaking CXP 3 31.56 27 15.59 9.47 67.03 N n/a 06/2006 Incumbent
4 11-1011.00 4.C.1.a.2.c Public Speaking CXP 4 24.71 27 15.87 5.38 65.47 N n/a 06/2006 Incumbent

5 rows × 14 columns


In [62]:
#group by the factors
# work_context.groupby(['onet_soc_code','element_id','element_name','scale_id','category']).apply(sum)

In [63]:
len(work_context.element_name.unique())


Out[63]:
57

In [64]:
work_context[work_context.scale_id == "CT"]


Out[64]:
onet_soc_code element_id element_name scale_id category data_value n standard_error lower_ci_bound upper_ci_bound recommend_suppress not_relevant date domain_source
330 11-1011.00 4.C.3.d.4 Work Schedules CT n/a 1.00 27 0.00 1.00 1.01 N n/a 06/2006 Incumbent
334 11-1011.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.91 27 0.09 2.73 3.00 N n/a 06/2006 Incumbent
668 11-1011.03 4.C.3.d.4 Work Schedules CT n/a 1.35 26 n/a n/a n/a n/a n/a 07/2013 Occupational Expert
672 11-1011.03 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.77 26 n/a n/a n/a n/a n/a 07/2013 Occupational Expert
1006 11-1021.00 4.C.3.d.4 Work Schedules CT n/a 1.37 43 0.15 1.06 1.67 N n/a 06/2008 Incumbent
1010 11-1021.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.67 44 0.11 2.46 2.89 N n/a 06/2008 Incumbent
1344 11-2011.00 4.C.3.d.4 Work Schedules CT n/a 1.04 19 0.03 1.00 1.09 N n/a 06/2010 Incumbent
1348 11-2011.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.51 18 0.14 2.20 2.81 N n/a 06/2010 Incumbent
1682 11-2021.00 4.C.3.d.4 Work Schedules CT n/a 1.28 25 n/a n/a n/a n/a n/a 06/2008 Occupational Expert
1686 11-2021.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.68 25 n/a n/a n/a n/a n/a 06/2008 Occupational Expert
2020 11-2022.00 4.C.3.d.4 Work Schedules CT n/a 1.19 21 n/a n/a n/a n/a n/a 06/2008 Occupational Expert
2024 11-2022.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.81 21 n/a n/a n/a n/a n/a 06/2008 Occupational Expert
2358 11-2031.00 4.C.3.d.4 Work Schedules CT n/a 1.14 30 0.07 1.00 1.28 N n/a 06/2009 Incumbent
2362 11-2031.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.82 30 0.09 2.64 3.00 N n/a 06/2009 Incumbent
2696 11-3011.00 4.C.3.d.4 Work Schedules CT n/a 1.11 44 0.06 1.00 1.23 N n/a 06/2009 Incumbent
2700 11-3011.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.37 44 0.12 2.13 2.62 N n/a 06/2009 Incumbent
3034 11-3021.00 4.C.3.d.4 Work Schedules CT n/a 1.27 41 0.12 1.01 1.52 N n/a 06/2008 Incumbent
3038 11-3021.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.57 41 0.10 2.37 2.77 N n/a 06/2008 Incumbent
3372 11-3031.01 4.C.3.d.4 Work Schedules CT n/a 1.00 29 n/a n/a n/a n/a n/a 07/2012 Occupational Expert
3376 11-3031.01 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.77 30 n/a n/a n/a n/a n/a 07/2012 Occupational Expert
3710 11-3031.02 4.C.3.d.4 Work Schedules CT n/a 1.00 15 0.00 1.00 1.00 N n/a 06/2006 Incumbent
3714 11-3031.02 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.56 15 0.20 2.13 2.99 N n/a 06/2006 Incumbent
4048 11-3051.00 4.C.3.d.4 Work Schedules CT n/a 1.35 42 0.13 1.09 1.60 N n/a 07/2013 Incumbent
4052 11-3051.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.85 41 0.08 2.70 3.00 N n/a 07/2013 Incumbent
4386 11-3051.01 4.C.3.d.4 Work Schedules CT n/a 1.26 22 0.21 1.00 1.68 N n/a 07/2012 Incumbent
4390 11-3051.01 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.77 22 0.19 2.37 3.00 N n/a 07/2012 Incumbent
4724 11-3051.02 4.C.3.d.4 Work Schedules CT n/a 1.38 17 0.13 1.10 1.66 N n/a 07/2011 Incumbent
4728 11-3051.02 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.90 17 0.06 2.77 3.00 N n/a 07/2011 Incumbent
5062 11-3051.04 4.C.3.d.4 Work Schedules CT n/a 1.11 46 0.07 1.00 1.26 N n/a 07/2012 Incumbent
5066 11-3051.04 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.87 47 0.10 2.66 3.00 N n/a 07/2012 Incumbent
5400 11-3061.00 4.C.3.d.4 Work Schedules CT n/a 1.33 24 n/a n/a n/a n/a n/a 06/2009 Occupational Expert
5404 11-3061.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.83 24 n/a n/a n/a n/a n/a 06/2009 Occupational Expert
5738 11-3071.01 4.C.3.d.4 Work Schedules CT n/a 1.17 23 n/a n/a n/a n/a n/a 06/2009 Occupational Expert
5742 11-3071.01 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.70 23 n/a n/a n/a n/a n/a 06/2009 Occupational Expert
6076 11-3071.02 4.C.3.d.4 Work Schedules CT n/a 1.25 24 n/a n/a n/a n/a n/a 06/2008 Occupational Expert
6080 11-3071.02 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.46 24 n/a n/a n/a n/a n/a 06/2008 Occupational Expert
6414 11-3071.03 4.C.3.d.4 Work Schedules CT n/a 1.13 23 n/a n/a n/a n/a n/a 07/2011 Occupational Expert
6418 11-3071.03 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.57 23 n/a n/a n/a n/a n/a 07/2011 Occupational Expert
6752 11-3111.00 4.C.3.d.4 Work Schedules CT n/a 1.33 21 n/a n/a n/a n/a n/a 06/2008 Occupational Expert
6756 11-3111.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.81 21 n/a n/a n/a n/a n/a 06/2008 Occupational Expert
7090 11-3121.00 4.C.3.d.4 Work Schedules CT n/a 1.14 21 n/a n/a n/a n/a n/a 06/2008 Occupational Expert
7094 11-3121.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.82 22 n/a n/a n/a n/a n/a 06/2008 Occupational Expert
7428 11-3131.00 4.C.3.d.4 Work Schedules CT n/a 1.25 24 n/a n/a n/a n/a n/a 06/2008 Occupational Expert
7432 11-3131.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.92 24 n/a n/a n/a n/a n/a 06/2008 Occupational Expert
7766 11-9013.01 4.C.3.d.4 Work Schedules CT n/a 1.92 24 0.08 1.76 2.08 N n/a 06/2007 Incumbent
7770 11-9013.01 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.32 24 0.16 1.99 2.65 N n/a 06/2007 Incumbent
8104 11-9013.03 4.C.3.d.4 Work Schedules CT n/a 1.65 31 n/a n/a n/a n/a n/a 12/2006 Occupational Expert
8108 11-9013.03 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.65 31 n/a n/a n/a n/a n/a 12/2006 Occupational Expert
8442 11-9021.00 4.C.3.d.4 Work Schedules CT n/a 1.52 25 n/a n/a n/a n/a n/a 07/2013 Occupational Expert
8446 11-9021.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.84 25 n/a n/a n/a n/a n/a 07/2013 Occupational Expert
8780 11-9031.00 4.C.3.d.4 Work Schedules CT n/a 1.02 21 0.02 1.00 1.07 N n/a 06/2010 Incumbent
8784 11-9031.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.56 21 0.22 2.10 3.00 N n/a 06/2010 Incumbent
9118 11-9032.00 4.C.3.d.4 Work Schedules CT n/a 1.13 33 0.07 1.00 1.27 N n/a 06/2010 Incumbent
9122 11-9032.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.76 33 0.13 2.50 3.00 N n/a 06/2010 Incumbent
9456 11-9033.00 4.C.3.d.4 Work Schedules CT n/a 1.27 32 0.12 1.03 1.52 N n/a 06/2010 Incumbent
9460 11-9033.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.80 33 0.10 2.60 2.99 N n/a 06/2010 Incumbent
9794 11-9039.01 4.C.3.d.4 Work Schedules CT n/a 1.41 22 n/a n/a n/a n/a n/a 07/2013 Occupational Expert
9798 11-9039.01 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.55 22 n/a n/a n/a n/a n/a 07/2013 Occupational Expert
10132 11-9041.00 4.C.3.d.4 Work Schedules CT n/a 1.33 44 0.13 1.06 1.60 N n/a 07/2012 Incumbent
10136 11-9041.00 4.C.3.d.8 Duration of Typical Work Week CT n/a 2.72 45 0.13 2.45 2.99 N n/a 07/2012 Incumbent
... ... ... ... ... ... ... ... ... ... ... ... ... ...

1844 rows × 14 columns


In [65]:
#what are the scales?
getDescriptions(work_context, scales_reference, "scale_id")


Out[65]:
scale_id scale_name minimum maximum
0 CX Context 1 5
1 CXP Context (Categories 1-5) 0 100
2 CT Context 1 3
3 CTP Context (Categories 1-3) 0 100

4 rows × 4 columns


In [66]:
#what are the categories? These definitions are in the work_context_categories dataframe
#need to change work_context.category from dtype object to dtype int
work_context.category.replace("n/a","0", inplace=True)

In [67]:
work_context.category = work_context.category.astype(int)

In [68]:
categories_desc = getDescriptions(work_context, work_context_categories, "category")
categories_desc[categories_desc.scale_id == "CTP"]
#looks like they are dependent on the element name/id and the scale, let's groupby


Out[68]:
category element_id element_name scale_id category_description
55 1 4.C.3.d.4 Work Schedules CTP Regular (established routine, set schedule)
56 1 4.C.3.d.8 Duration of Typical Work Week CTP Less than 40 hours
112 2 4.C.3.d.4 Work Schedules CTP Irregular (changes with weather conditions, pr...
113 2 4.C.3.d.8 Duration of Typical Work Week CTP 40 hours
169 3 4.C.3.d.4 Work Schedules CTP Seasonal (only during certain times of the year)
170 3 4.C.3.d.8 Duration of Typical Work Week CTP More than 40 hours

6 rows × 5 columns


In [69]:
#let's group the category description to figure out what's going on
categories_desc.groupby(['element_name','scale_id','category']).apply(sum)


Out[69]:
category element_id element_name scale_id category_description
element_name scale_id category
Consequence of Error CXP 1 1 4.C.3.a.1 Consequence of Error CXP Not serious at all
2 2 4.C.3.a.1 Consequence of Error CXP Fairly serious
3 3 4.C.3.a.1 Consequence of Error CXP Serious
4 4 4.C.3.a.1 Consequence of Error CXP Very serious
5 5 4.C.3.a.1 Consequence of Error CXP Extremely serious
Contact With Others CXP 1 1 4.C.1.a.4 Contact With Others CXP No contact with others
2 2 4.C.1.a.4 Contact With Others CXP Occasional contact with others
3 3 4.C.1.a.4 Contact With Others CXP Contact with others about half the time
4 4 4.C.1.a.4 Contact With Others CXP Contact with others most of the time
5 5 4.C.1.a.4 Contact With Others CXP Constant contact with others
Coordinate or Lead Others CXP 1 1 4.C.1.b.1.g Coordinate or Lead Others CXP Not important at all
2 2 4.C.1.b.1.g Coordinate or Lead Others CXP Fairly important
3 3 4.C.1.b.1.g Coordinate or Lead Others CXP Important
4 4 4.C.1.b.1.g Coordinate or Lead Others CXP Very important
5 5 4.C.1.b.1.g Coordinate or Lead Others CXP Extremely important
Cramped Work Space, Awkward Positions CXP 1 1 4.C.2.b.1.e Cramped Work Space, Awkward Positions CXP Never
2 2 4.C.2.b.1.e Cramped Work Space, Awkward Positions CXP Once a year or more but not every month
3 3 4.C.2.b.1.e Cramped Work Space, Awkward Positions CXP Once a month or more but not every week
4 4 4.C.2.b.1.e Cramped Work Space, Awkward Positions CXP Once a week or more but not every day
5 5 4.C.2.b.1.e Cramped Work Space, Awkward Positions CXP Every day
Deal With External Customers CXP 1 1 4.C.1.b.1.f Deal With External Customers CXP Not important at all
2 2 4.C.1.b.1.f Deal With External Customers CXP Fairly important
3 3 4.C.1.b.1.f Deal With External Customers CXP Important
4 4 4.C.1.b.1.f Deal With External Customers CXP Very important
5 5 4.C.1.b.1.f Deal With External Customers CXP Extremely important
Deal With Physically Aggressive People CXP 1 1 4.C.1.d.3 Deal With Physically Aggressive People CXP Never
2 2 4.C.1.d.3 Deal With Physically Aggressive People CXP Once a year or more but not every month
3 3 4.C.1.d.3 Deal With Physically Aggressive People CXP Once a month or more but not every week
4 4 4.C.1.d.3 Deal With Physically Aggressive People CXP Once a week or more but not every day
5 5 4.C.1.d.3 Deal With Physically Aggressive People CXP Every day
Deal With Unpleasant or Angry People CXP 1 1 4.C.1.d.2 Deal With Unpleasant or Angry People CXP Never
2 2 4.C.1.d.2 Deal With Unpleasant or Angry People CXP Once a year or more but not every month
3 3 4.C.1.d.2 Deal With Unpleasant or Angry People CXP Once a month or more but not every week
4 4 4.C.1.d.2 Deal With Unpleasant or Angry People CXP Once a week or more but not every day
5 5 4.C.1.d.2 Deal With Unpleasant or Angry People CXP Every day
Degree of Automation CXP 1 1 4.C.3.b.2 Degree of Automation CXP Not at all automated
2 2 4.C.3.b.2 Degree of Automation CXP Slightly automated
3 3 4.C.3.b.2 Degree of Automation CXP Moderately automated
4 4 4.C.3.b.2 Degree of Automation CXP Highly automated
5 5 4.C.3.b.2 Degree of Automation CXP Completely automated
Duration of Typical Work Week CTP 1 1 4.C.3.d.8 Duration of Typical Work Week CTP Less than 40 hours
2 2 4.C.3.d.8 Duration of Typical Work Week CTP 40 hours
3 3 4.C.3.d.8 Duration of Typical Work Week CTP More than 40 hours
Electronic Mail CXP 1 1 4.C.1.a.2.h Electronic Mail CXP Never
2 2 4.C.1.a.2.h Electronic Mail CXP Once a year or more but not every month
3 3 4.C.1.a.2.h Electronic Mail CXP Once a month or more but not every week
4 4 4.C.1.a.2.h Electronic Mail CXP Once a week or more but not every day
5 5 4.C.1.a.2.h Electronic Mail CXP Every day
Exposed to Contaminants CXP 1 1 4.C.2.b.1.d Exposed to Contaminants CXP Never
2 2 4.C.2.b.1.d Exposed to Contaminants CXP Once a year or more but not every month
3 3 4.C.2.b.1.d Exposed to Contaminants CXP Once a month or more but not every week
4 4 4.C.2.b.1.d Exposed to Contaminants CXP Once a week or more but not every day
5 5 4.C.2.b.1.d Exposed to Contaminants CXP Every day
Exposed to Disease or Infections CXP 1 1 4.C.2.c.1.b Exposed to Disease or Infections CXP Never
2 2 4.C.2.c.1.b Exposed to Disease or Infections CXP Once a year or more but not every month
3 3 4.C.2.c.1.b Exposed to Disease or Infections CXP Once a month or more but not every week
4 4 4.C.2.c.1.b Exposed to Disease or Infections CXP Once a week or more but not every day
5 5 4.C.2.c.1.b Exposed to Disease or Infections CXP Every day
Exposed to Hazardous Conditions CXP 1 1 4.C.2.c.1.d Exposed to Hazardous Conditions CXP Never
2 2 4.C.2.c.1.d Exposed to Hazardous Conditions CXP Once a year or more but not every month
... ... ... ... ...

281 rows × 5 columns


In [70]:
#how many features are in work context?
feature(work_context)


Out[70]:
336

In [71]:
#percent that are relevant
getRelevance(work_context)


Out[71]:
100.0

In [72]:
#percent to be excluded
getExclusions(work_context)


Out[72]:
2.5611988508224726

Work Styles


In [73]:
work_styles.head()


Out[73]:
onet_soc_code element_id element_name scale_id data_value n standard_error lower_ci_bound upper_ci_bound recommend_suppress date domain_source
0 11-1011.00 1.C.1.a Achievement/Effort IM 4.66 30 0.18 4.30 5.00 N 06/2006 Incumbent
1 11-1011.00 1.C.1.b Persistence IM 4.61 30 0.19 4.23 4.99 N 06/2006 Incumbent
2 11-1011.00 1.C.1.c Initiative IM 4.79 30 0.14 4.51 5.00 N 06/2006 Incumbent
3 11-1011.00 1.C.2.b Leadership IM 4.84 30 0.13 4.57 5.00 N 06/2006 Incumbent
4 11-1011.00 1.C.3.a Cooperation IM 4.42 30 0.19 4.02 4.81 N 06/2006 Incumbent

5 rows × 12 columns


In [74]:
# work_styles.groupby(['onet_soc_code','element_id','element_name','scale_id']).apply(sum)

In [75]:
#what are the elements
getDescriptions(work_styles, content_model_reference, "element_name")


Out[75]:
element_name element_id description
0 Achievement/Effort 1.C.1.a Job requires establishing and maintaining pers...
1 Persistence 1.C.1.b Job requires persistence in the face of obstac...
2 Initiative 1.C.1.c Job requires a willingness to take on responsi...
3 Leadership 1.C.2.b Job requires a willingness to lead, take charg...
4 Cooperation 1.C.3.a Job requires being pleasant with others on the...
5 Concern for Others 1.C.3.b Job requires being sensitive to others' needs ...
6 Social Orientation 1.C.3.c Job requires preferring to work with others ra...
7 Self Control 1.C.4.a Job requires maintaining composure, keeping em...
8 Stress Tolerance 1.C.4.b Job requires accepting criticism and dealing c...
9 Adaptability/Flexibility 1.C.4.c Job requires being open to change (positive or...
10 Dependability 1.C.5.a Job requires being reliable, responsible, and ...
11 Attention to Detail 1.C.5.b Job requires being careful about detail and th...
12 Integrity 1.C.5.c Job requires being honest and ethical.
13 Independence 1.B.2.b.2 Workers on this job do their work alone.
14 Independence 1.B.2.f Occupations that satisfy this work value allow...
15 Independence 1.C.6 Job requires developing one's own ways of doin...
16 Innovation 1.C.7.a Job requires creativity and alternative thinki...
17 Innovation 4.B.2.c.1.a.7 Innovation; finding new and better ways of doi...
18 Analytical Thinking 1.C.7.b Job requires analyzing information and using l...

19 rows × 3 columns


In [76]:
#what are the different scales?
getDescriptions(work_styles, scales_reference, "scale_id")


Out[76]:
scale_id scale_name minimum maximum
0 IM Importance 1 5

1 rows × 4 columns


In [77]:
#how many features are there?
feature(work_styles)


Out[77]:
16

In [78]:
#what percentage should be excluded?
getExclusions(work_styles)


Out[78]:
0.0

Work Values


In [79]:
work_values.head()


Out[79]:
onet_soc_code element_id element_name scale_id data_value date domain_source
0 11-1011.00 1.B.2.a Achievement EX 6.33 06/2008 Analyst
1 11-1011.00 1.B.2.b Working Conditions EX 6.33 06/2008 Analyst
2 11-1011.00 1.B.2.c Recognition EX 7.00 06/2008 Analyst
3 11-1011.00 1.B.2.d Relationships EX 5.00 06/2008 Analyst
4 11-1011.00 1.B.2.e Support EX 5.33 06/2008 Analyst

5 rows × 7 columns


In [80]:
# work_values.groupby(['onet_soc_code','element_id','element_name','scale_id']).apply(sum)

In [81]:
#what are the different element names?
getDescriptions(work_values, content_model_reference, "element_name")


Out[81]:
element_name element_id description
0 Achievement 1.B.2.a Occupations that satisfy this work value are r...
1 Achievement 1.B.2.a.2 Workers on this job get a feeling of accomplis...
2 Working Conditions 1.B.2.b Occupations that satisfy this work value offer...
3 Working Conditions 1.B.2.b.6 Workers on this job have good working conditions.
4 Recognition 1.B.2.c Occupations that satisfy this work value offer...
5 Recognition 1.B.2.c.2 Workers on this job receive recognition for th...
6 Relationships 1.B.2.d Occupations that satisfy this work value allow...
7 Support 1.B.2.e Occupations that satisfy this work value offer...
8 Independence 1.B.2.b.2 Workers on this job do their work alone.
9 Independence 1.B.2.f Occupations that satisfy this work value allow...
10 Independence 1.C.6 Job requires developing one's own ways of doin...
11 First Work Value High-Point 1.B.2.g Primary-Rank Descriptiveness
12 Second Work Value High-Point 1.B.2.h Secondary-Cutoff/Rank Descriptiveness
13 Third Work Value High-Point 1.B.2.i Tertiary-Cutoff/Rank Descriptiveness

14 rows × 3 columns


In [82]:
#get scales
getDescriptions(work_values, scales_reference, "scale_id")


Out[82]:
scale_id scale_name minimum maximum
0 EX Extent 1 7
1 VH Work Value High-Point 1 6

2 rows × 4 columns


In [83]:
#what are the number of features
feature(work_values)


Out[83]:
9