Occupational Histories

I would like to revisit the construction of the occupational histories. In the end, we need to classify each individual working in either a blue- or a white collar occupation in a period.

There are two main questions:

(1) How to ensure comparability between the different CPS classifications in the NLSY dataset? See NLSY, there is not a single classifiation that can be used throughout the full NLSY life-cycle.

(2) There are some oddities with the data that is acutally reported in the NLSY.

As usual, please do some research on these questions yourself first. But in the end, reach out to the NLSY staff if you require their assistance.

Comparability of CPS Classifications over time

Is there any guidance provided by the NLSY or the anyone else on how to ensure comparability. Or do we simply have to assign occupations to white- and blue collar ourselfs. Is there a well-published paper that thoroughly documents what they have done.

Currently, I am simply relying on footnote 18 of Keane & Wolpin (1997) for the classification:

Occupational categories are based on one-digit census codes. Blue-collar occupations are (i) craftsmen, foremen, and kindred; (ii) operatives and kindred; (iii) laborers, except farm; (iv) farm laborers and foremen; and (v) service workers. White-collar occupations are (i) professional, technical, and kindred; (ii) managers, officials, and proprietors; (iii) sales workers; (iv) farmers and farm managers; and (v) clerical and kindred.

This works well until the CPS 70 codes are no longer provided in the NLSY and subsituted with the 2000 codes.

Solution 1

The Integrated Public Use Microdata Series (IPUMS-USA) has spent a lot of effort in creating crosswalks between the decennial occupation classification systems of the CPS. Bridging two major classification changes in 1980 and 2000, they decided to create a variable (OCC1990) which can be used by researcher to analyse longer periods of occupational data.

Literature:

A crosswalk over all decennial systems from 1950 to 2000 can be found on this page and downloaded here.

From now on, there are two possible ways to construct consistent occupations codes:

  1. The crosswalk can be used to map as many categories from the 2000er system to the 1970er system. Then, the footnotes from Keane, Wolpin (1997) can be applied to the codes.

    The benefits would be that the definition of blue and white collar workers is identical and does not have to be adjusted.

    The disadvantage is that there are 143 categories of the 2000_1 system which will not be mapped and 124 of the 2000_5 system. (For completeness, there are 116 and 130 unmapped categories for the 1970 codes in the 2000_1 and 2000_5 system, respectively.)

  2. One could use IPUMS OCC1990 variable to map all categories between all systems, but one has to adjust the definition of blue and white collar workers.

    Some numbers:

    • maps every category from 1970, 2000 1%, and 2000 5%
    • comprises 396 categories, not blown up like the 2000er with ca. 500 items, more similar to 1970 with 440 items

Then, there is also the question which 2000er category system should be used from the crosswalk. The IPUMS website has an explanation for the difference (source):

The 2000 5% sample contains less detail than the 2000 1% sample. In the 5% sample, any category representing fewer than 10,000 people was combined with a larger, more generalized category.

Since the NLSY page does not offer more information on the categories, I compared the overlapping categories and whether they are detailed in the NLSY documentation on the 2000er system. The NLSY 2000 codes correspond to 2000 1% codes.


In [1]:
import pandas as pd

crosswalk = pd.read_excel('../data/external/occ_crosswalks/occ1990_xwalk.xls')
crosswalk = crosswalk.iloc[:,[0, 1, 4, 7, 8]]
crosswalk.columns = ['OCC1990', 'DESCRIPTION', 'CPS_1970', 'CPS_2000_1', 'CPS_2000_5']
# Delete headings in file
crosswalk = crosswalk.loc[~(crosswalk.OCC1990 == '#')]

In [2]:
# Look for missing connections between the systems
# crosswalk.loc[crosswalk.CPS_2000_1.notnull()].isnull().sum()

In [3]:
# View overhanging 2000 1% codes
# crosswalk[crosswalk.CPS_2000_1.notnull() & crosswalk.CPS_2000_5.isnull()]

Oddities in Data

This is going to be a longer explanation about the occupation variables per se and after that, the questions will be answered easily.

First, there exist two kinds of variables which report occupation categories for jobs held by an individual, CPSOCC70 and OCCALL-EMP.#. (All the information regarding this point can be found in the topical guide on employment for jobs & employers.)

CPSOCC70 reports the CPS job, current or most recent employer, of the respondent which is one selected job from all the jobs the respondent had held between interviews. The question to identify the CPS job varied by interview mode. From 1979-92, the question was posed like this:

For whom did you work last (week)? IF MORE THAN ONE EMPLOYER, PROBE: for whom did you work the most hours during the last week (you worked)?

If answers were ambiguous because the respondent did not work last week, had more employers, etc., calls were made according to Table 2.

OCCALL-EMP.# records answers to the Employer Supplement (ES from 1980) for each job held by the respondent since the last interview. Although information on all jobs is collected, only the five jobs are reported. Note that, numbers concerning hours worked, etc. report numbers for all jobs held.

Here are some notes on how both variables relate to each other:

  • 1979: CPS job is Job #1
  • 1980-92: Jobs were collected in reverse chronological order meaning Job #1 is CPS job most of the time but not always
  • 1993-: Due to mechanis of CAPI interviews, Job #1 is always CPS job

Question: There exists a set of variables grouped under OCALL-EMP for all survey years. Most of the time it has five entries, one for each of 1 - 5 jobs. However, in 1979/1993 it only has four entries and the occupation for job #1 is missing. Do we find that information somewhere else?

Answer: Most of the explanation can be taken from above. In 1979, CPS job always coincides with Job #1. Due to mechanics of CAPI interview, Job #1 is the same as CPS jo since 1993, but only since 1994 the naming convention was changed (Source):

Until 1994, the current or most recent employer, called the "CPS employer," is differentiated in the data set from other employers for whom the respondent reported working since the last interview by title (that is, start date for CPS job, start date for Job #2, start date for Job #3, and so forth). Beginning in 1994, CPS job information is simply labeled as "job #1" because job specific information is all collected in the Employer Supplement.

Question: What exactly is the role of CPSSOCC70, which is available for the years 1979-1993 in addition to OCCALL-EMP? Could most recent job be one of the job 1-5 but also a #6, for example?

Answer: Should be answered by the introduction.