MONTH/YEAR_OF_BIRTH
variableThere are two pairs of month, year variables (Example with reference numbers):
Under Important information about using age data in the user guide for age:
The 1981 birth dates should be used to determine age with the 1979 dates used only as a backup. Differences between 1979 and 1981 birth dates remained for approximately 200-250 respondents after the 1981 fielding; editing on a case-by-case basis was performed by CHRR staff on only the 1981 variable.
In [1]:
import os
import numpy as np
import pandas as pd
In [2]:
afqt = pd.read_csv(os.path.join('..', 'data', 'external',
'afqt', 'afqt.csv'),
index_col=False, header=0)
column_labels = dict()
column_labels['R0000100'] = 'IDENTIFIER'
column_labels['R0000500'] = 'YEAR_OF_BIRTH_1979'
column_labels['R0000300'] = 'MONTH_OF_BIRTH_1979'
column_labels['R0410100'] = 'MONTH_OF_BIRTH_1981'
column_labels['R0410300'] = 'YEAR_OF_BIRTH_1981'
afqt.rename(columns=column_labels, inplace=True)
In [3]:
### Construct MONTH_OF_BIRTH and YEAR_OF_BIRTH
# Replace -5 values with np.nan
afqt.loc[
afqt.YEAR_OF_BIRTH_1981 == -5, 'YEAR_OF_BIRTH_1981'] = np.nan
afqt.loc[
afqt.MONTH_OF_BIRTH_1981 == -5, 'MONTH_OF_BIRTH_1981'] = np.nan
# Replace missings in 1981 with the values from the 1979 survey
afqt.loc[afqt.YEAR_OF_BIRTH_1981.isnull(),
'YEAR_OF_BIRTH_1981'] = afqt.YEAR_OF_BIRTH_1979
afqt.loc[afqt.MONTH_OF_BIRTH_1981.isnull(),
'MONTH_OF_BIRTH_1981'] = afqt.MONTH_OF_BIRTH_1979
# Now cast to integers, also a checks for nans
afqt['MONTH_OF_BIRTH'] = afqt.MONTH_OF_BIRTH_1981.astype(int)
afqt['YEAR_OF_BIRTH'] = afqt.YEAR_OF_BIRTH_1981.astype(int)
# Drop old variables
afqt.drop([
i for i in afqt.keys() if i.startswith('MONTH_OF_BIRTH_') |
i.startswith('YEAR_OF_BIRTH_')], axis=1, inplace=True)