New York University

Applied Data Science 2016 Final Project

Measuring household income under Redatam in CensusData

2. Merge Individual to Household Data

Project Description: Lorem ipsum

Members:

Felipe Gonzales
Ilan Reinstein
Fernando Melchor
Nicolas Metallo

Sources:

LIBRARIES



In [46]:

    
# helper functions
import getEPH
import categorize
import createVariables
import schoolYears
import make_dummy
import functionsForModels
# libraries
import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.sandbox.regression.predstd import wls_prediction_std
from statsmodels.iolib.table import (SimpleTable, default_txt_fmt)
np.random.seed(1024)
%matplotlib inline



In [47]:

    
getEPH.getEPHdbf('t310')









    



('Downloading', 't310')
file in place, creating CSV file
csv file cleanDataHousehold t310 .csv successfully created in folder data/
csv file cleanData t310 .csv successfully created in folder data/

DATA HANDLING



In [48]:

    
data1 = pd.read_csv('data/cleanDatat310.csv')



In [49]:

    
data2 = categorize.categorize(data1)
data3 = schoolYears.schoolYears(data2)
data4 = createVariables.createVariables(data3)

REGRESSION MODEL

Predicted Income:

Using the model generated in the previous notebook (Model # 2 - Alternative) we create a new variable called predicted income with its output. The ultimate goal is to merge this data with household data.



In [50]:

    
jobsAndIncome = (data4.activity == 1) & (data4.P21 > 1) # we only consider people who are working and have income
headAndSpouse = (data4.familyRelation == 1)|(data4.familyRelation == 2) #
dataParaModelo = data4.copy().loc[jobsAndIncome,:]



In [51]:

    
variablesOfInterest = ['age',
                              'age2',
                              'female',
                              'education',
                              'education2']



In [52]:

    
model = functionsForModels.runModel(dataset = dataParaModelo, income = 'lnIncome', variables = variablesOfInterest)









    



                            WLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.287
Model:                            WLS   Adj. R-squared:                  0.286
Method:                 Least Squares   F-statistic:                     289.9
Date:                Sun, 11 Dec 2016   Prob (F-statistic):          2.70e-261
Time:                        22:15:31   Log-Likelihood:                -4091.5
No. Observations:                3608   AIC:                             8195.
Df Residuals:                    3602   BIC:                             8232.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          5.5938      0.138     40.537      0.000         5.323     5.864
x1             0.0659      0.005     12.328      0.000         0.055     0.076
x2            -0.0007   6.16e-05    -11.488      0.000        -0.001    -0.001
x3            -0.5528      0.025    -22.012      0.000        -0.602    -0.504
x4             0.0277      0.016      1.779      0.075        -0.003     0.058
x5             0.0032      0.001      4.525      0.000         0.002     0.005
==============================================================================
Omnibus:                      641.219   Durbin-Watson:                   1.858
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             1458.019
Skew:                          -1.012   Prob(JB):                         0.00
Kurtosis:                       5.367   Cond. No.                     2.45e+04
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.45e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
x1: age
x2: age2
x3: female
x4: education
x5: education2
IS R-squared for 1000 times is 0.298124864616
OS R-squared for 1000 times is 0.293081804822



In [53]:

    
X = sm.add_constant(dataParaModelo.copy().loc[:,variablesOfInterest].values)



In [54]:

    
dataParaModelo['predictedLnIncome'] = model.predict(X)

Merge subset of individuals (with job and income) with total individuals

We create a merged dataset combining a subset of individuals (with job and income) with the total individuals. We then account for people with no income (kids or unemployed) and assign them a "0" income. The resulting dataset will succesfully match every individual with their household.



In [55]:

    
paraMerge = dataParaModelo.loc[:,['CODUSU', 'NRO_HOGAR', 'COMPONENTE','predictedLnIncome']]
paraMerge.head()









    Out[55]:






  
    
      
      CODUSU
      NRO_HOGAR
      COMPONENTE
      predictedLnIncome
    
  
  
    
      2
      307861
      1
      1
      7.478150
    
    
      3
      307861
      1
      2
      7.991038
    
    
      7
      308278
      1
      1
      8.485189
    
    
      8
      308278
      1
      2
      7.750667
    
    
      9
      311937
      1
      1
      8.336136



In [56]:

    
data = pd.merge(left = data4 , right = paraMerge, on = ['CODUSU', 'NRO_HOGAR', 'COMPONENTE'], how = 'left')



In [57]:

    
data.predictedLnIncome[data.predictedLnIncome.isnull()] = 0









    



/Users/IlanReinstein/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

New variables for individuals

We create a set of new variables (re-arrangement of previous ones) set in a way we create will add value to our prediction



In [58]:

    
# Variables related to Occupation
data['job'] = (data.activity==1).astype(int)
data['noJob'] = (data.activity!=1).astype(int)
data['schoolAndJob'] = data.job * data.education



In [59]:

    
cantidadActivos = data.job.groupby(by=data['id']).sum() # Number of people working in the household
cantidadInactivos = data.noJob.groupby(by=data['id']).sum() # Number of people not working in the household
schoolAndJob = data.schoolAndJob.groupby(by=data['id']).sum() # Sum of total schooling years in the household



In [60]:

    
dfJobsAndEduc = pd.merge(left = schoolAndJob.to_frame() ,
                         right = cantidadInactivos.to_frame(),
                         left_index = True,
                         right_index = True)
dfJobsAndEduc = pd.merge(left = dfJobsAndEduc ,
                         right = cantidadActivos.to_frame(),
                         left_index = True,
                         right_index = True)

Create Pivot Dataset

We create a pivot dataset in order to merge it later with the household dataset. With this we will have all individual information relative to a particular household.



In [61]:

    
cleanData = data.copy().loc[(headAndSpouse),
                              ['id',
                               'AGLOMERADO',
                            'familyRelation',
                              'age',
                              'age2',
                              'female',
                              'education',
                              'education2',
                              'primary',
                              'secondary',
                              'university',
                              'P21',
                              'P47T',
                              'lnIncome',
                               u'lnIncomeT',
                              'predictedLnIncome',
                              'job',
                              'DECCFR',
                               'DECIFR',
                               'maritalStatus',
                               'reading',
                               'placeOfBirth',
                              ]]



In [62]:

    
cleanData.head()









    Out[62]:






  
    
      
      id
      AGLOMERADO
      familyRelation
      age
      age2
      female
      education
      education2
      primary
      secondary
      ...
      P47T
      lnIncome
      lnIncomeT
      predictedLnIncome
      job
      DECCFR
      DECIFR
      maritalStatus
      reading
      placeOfBirth
    
  
  
    
      0
      3024681
      32
      1
      20
      400
      1
      13.0
      169.0
      7.0
      5.0
      ...
      2000
      0.000000
      7.600902
      0.000000
      0
      8
      6
      5
      1
      1
    
    
      2
      3078611
      32
      1
      42
      1764
      0
      7.0
      49.0
      7.0
      0.0
      ...
      3000
      8.006368
      8.006368
      7.478150
      1
      6
      8
      2
      1
      2
    
    
      3
      3078611
      32
      2
      44
      1936
      1
      17.0
      289.0
      7.0
      5.0
      ...
      2800
      7.937375
      7.937375
      7.991038
      1
      6
      8
      2
      1
      2
    
    
      6
      3087621
      32
      1
      68
      4624
      1
      17.0
      289.0
      7.0
      5.0
      ...
      3200
      0.000000
      8.070906
      0.000000
      0
      9
      5
      4
      1
      3
    
    
      7
      3082781
      32
      1
      38
      1444
      0
      17.0
      289.0
      7.0
      5.0
      ...
      6000
      8.517193
      8.699515
      8.485189
      1
      10
      10
      2
      1
      1
    
  

5 rows × 22 columns



In [63]:

    
pivot = cleanData.pivot(index='id', columns='familyRelation')
pivot.head()









    Out[63]:






  
    
      
      AGLOMERADO
      age
      age2
      female
      education
      ...
      DECCFR
      DECIFR
      maritalStatus
      reading
      placeOfBirth
    
    
      familyRelation
      1
      2
      1
      2
      1
      2
      1
      2
      1
      2
      ...
      1
      2
      1
      2
      1
      2
      1
      2
      1
      2
    
    
      id
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1250971
      33.0
      NaN
      57.0
      NaN
      3249.0
      NaN
      1.0
      NaN
      14.0
      NaN
      ...
      8.0
      NaN
      9.0
      NaN
      4.0
      NaN
      1.0
      NaN
      1.0
      NaN
    
    
      1253451
      32.0
      NaN
      66.0
      NaN
      4356.0
      NaN
      1.0
      NaN
      7.0
      NaN
      ...
      8.0
      NaN
      6.0
      NaN
      3.0
      NaN
      1.0
      NaN
      1.0
      NaN
    
    
      1254461
      32.0
      NaN
      79.0
      NaN
      6241.0
      NaN
      1.0
      NaN
      12.0
      NaN
      ...
      7.0
      NaN
      2.0
      NaN
      4.0
      NaN
      1.0
      NaN
      1.0
      NaN
    
    
      1256691
      33.0
      NaN
      61.0
      NaN
      3721.0
      NaN
      0.0
      NaN
      7.0
      NaN
      ...
      8.0
      NaN
      9.0
      NaN
      4.0
      NaN
      1.0
      NaN
      1.0
      NaN
    
    
      1256892
      32.0
      32.0
      31.0
      23.0
      961.0
      529.0
      0.0
      1.0
      12.0
      12.0
      ...
      5.0
      5.0
      5.0
      5.0
      1.0
      1.0
      1.0
      1.0
      4.0
      4.0
    
  

5 rows × 40 columns

Create Intermediate Dataset

We save our dataset into a csv file in order to eliminate the double index, and then after some data handling we export our final dataset.



In [64]:

    
pivot.to_csv('data/pivotInd.csv')



In [65]:

    
dataN = pd.read_csv('data/pivotInd.csv', names = ['id','AGLO1','AGLO2','headAge','spouseAge','headAge2','spouseAge2',
                                                  'headFemale','spouseFemale','headEduc','spouseEduc',
                                                  'headEduc2','spouseEduc2','headPrimary','spousePrimary',
                                                  'headSecondary','spouseSecondary','headUniversity','spouseUniversity',
                                                  'headP21','spouseP21','headP47T','spouseP47T',
                                                  'headLnIncome','spouseLnIncome','headLnIncomeT','spouseLnIncomeT',
                                                  'headPredictedLnIncome','spousePredictedLnIncome','headJob','spouseJob',
                                                  'headDECCFR','spouseDECCFR','headDECIFR','spouseDECIFR',
                                                  'headMaritalStatus','spouseMaritalStatus',
                                                  'headReading','spouseReading','headPlaceOfBirth','spouseplaceOfBirth',
                                                 ],skiprows = 3)



In [66]:

    
dfJobsAndEduc['id'] = dfJobsAndEduc.index
dfJobsAndEduc['id'] = dfJobsAndEduc['id'].astype(int)
dataFinalCSV = pd.merge(left = dfJobsAndEduc ,
                         right = dataN,
                         left_on = 'id',
                         right_on = 'id')



In [67]:

    
dataFinalCSV.to_csv('data/pivotInd.csv',index=False)
dataFinalCSV.head()









    Out[67]:






  
    
      
      schoolAndJob
      noJob
      job
      id
      AGLO1
      AGLO2
      headAge
      spouseAge
      headAge2
      spouseAge2
      ...
      headDECCFR
      spouseDECCFR
      headDECIFR
      spouseDECIFR
      headMaritalStatus
      spouseMaritalStatus
      headReading
      spouseReading
      headPlaceOfBirth
      spouseplaceOfBirth
    
  
  
    
      0
      40.0
      1
      3
      1250971
      33.0
      NaN
      57.0
      NaN
      3249.0
      NaN
      ...
      8.0
      NaN
      9.0
      NaN
      4.0
      NaN
      1.0
      NaN
      1.0
      NaN
    
    
      1
      12.0
      1
      1
      1253451
      32.0
      NaN
      66.0
      NaN
      4356.0
      NaN
      ...
      8.0
      NaN
      6.0
      NaN
      3.0
      NaN
      1.0
      NaN
      1.0
      NaN
    
    
      2
      0.0
      1
      0
      1254461
      32.0
      NaN
      79.0
      NaN
      6241.0
      NaN
      ...
      7.0
      NaN
      2.0
      NaN
      4.0
      NaN
      1.0
      NaN
      1.0
      NaN
    
    
      3
      14.0
      2
      2
      1256691
      33.0
      NaN
      61.0
      NaN
      3721.0
      NaN
      ...
      8.0
      NaN
      9.0
      NaN
      4.0
      NaN
      1.0
      NaN
      1.0
      NaN
    
    
      4
      24.0
      1
      2
      1256892
      32.0
      32.0
      31.0
      23.0
      961.0
      529.0
      ...
      5.0
      5.0
      5.0
      5.0
      1.0
      1.0
      1.0
      1.0
      4.0
      4.0
    
  

5 rows × 44 columns

GET HOUSEHOLD DATA

Variables description:

HomeType
FloorMaterial
RoofMaterial
RoofCoat
Water
WaterType
Toilet
ToletLocation
ToiletType



In [68]:

    
getEPH.getEPHdbf('t310')
hog = pd.read_csv('data/cleanDataHouseholdt310.csv')
print hog.shape
hog.head()









    



('Downloading', 't310')
file in place, creating CSV file
csv file cleanDataHousehold t310 .csv successfully created in folder data/
csv file cleanData t310 .csv successfully created in folder data/
(2703, 50)






    Out[68]:






  
    
      
      CODUSU
      NRO_HOGAR
      REGION
      PONDERA
      HomeType
      HomeTypeesp
      RoomsNumber
      FloorMaterial
      FloorMaterialesp
      RoofMaterial
      ...
      DomesticService3
      DomesticService4
      DomesticService5
      DomesticService6
      TotalFamilyIncome
      TotalFamilyIncomeDec
      TotalFamilyIncomeDecReg
      PerCapInc
      PerCapIncDec
      PerCapIncDecReg
    
  
  
    
      0
      302468
      1
      1
      1287
      2
      NaN
      2
      1
      NaN
      9
      ...
      2
      0
      0
      0
      4000.0
      6
      6
      2000.0
      8
      8
    
    
      1
      307861
      1
      1
      1674
      2
      NaN
      2
      1
      NaN
      1
      ...
      98
      0
      0
      0
      5800.0
      8
      8
      1450.0
      6
      6
    
    
      2
      308762
      1
      1
      1522
      2
      NaN
      4
      1
      NaN
      9
      ...
      98
      0
      0
      0
      3200.0
      5
      5
      3200.0
      9
      9
    
    
      3
      308278
      1
      1
      1320
      2
      NaN
      3
      1
      NaN
      9
      ...
      96
      0
      0
      0
      10000.0
      10
      10
      5000.0
      10
      10
    
    
      4
      311937
      1
      1
      1281
      2
      NaN
      4
      1
      NaN
      1
      ...
      96
      0
      0
      0
      11000.0
      10
      10
      2750.0
      9
      9
    
  

5 rows × 50 columns

DATA CLEANING



In [69]:

    
def remove9(df,variables):
    for var in variables:
        df[var].replace(to_replace=[9], value=[np.nan] , inplace=True, axis=None)

def remove0(df,variables):
    for var in variables:
        df[var].replace(to_replace=[0], value=[np.nan] , inplace=True, axis=None)

def remove99(df,variables):
    for var in variables:
        df[var].replace(to_replace=[99], value=[np.nan] , inplace=True, axis=None)



In [70]:

    
hog2 = hog.copy()

remove9(df = hog2, variables = ['FloorMaterial','RoofMaterial','RoofCoat','Water','WaterType','Toilet','ToiletLocation',
                               'ToiletType','Sewer','DumpSites','Flooding','EmergencyLoc','CookingCombustible',
                               'BathroomUse'])
remove0(df = hog2, variables = ['FloorMaterial','RoofMaterial','RoofCoat','Water','WaterType','Toilet','ToiletLocation',
                               'ToiletType','Sewer','DumpSites','Flooding','EmergencyLoc','Ownership','CookingCombustible',
                               'BathroomUse'])
remove99(df = hog2, variables = ['Ownership'])

Ordinal Order Transformation

In order to have positive coefficients we changed the order in which ordinal variables were displayed. We only do this for 'CookingCombustible' (which shows what combustible is used in the household) and for the others we use dummy variables.



In [71]:

    
variables = ['CookingCombustible']
for var in variables:
    print hog2[var].value_counts()
    plt.scatter(hog2[var], hog2.TotalHouseHoldIncome)
    plt.show()









    



1.0    2109
2.0     552
4.0      11
3.0       3
Name: CookingCombustible, dtype: int64



In [72]:

    
hog2['CookingRec'] = np.nan
hog2['CookingRec'][hog2.CookingCombustible == 4] = 1
hog2['CookingRec'][hog2.CookingCombustible == 3] = 1
hog2['CookingRec'][hog2.CookingCombustible == 2] = 2
hog2['CookingRec'][hog2.CookingCombustible == 1] = 3









    



/Users/IlanReinstein/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
/Users/IlanReinstein/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()
/Users/IlanReinstein/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/IlanReinstein/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [73]:

    
variables = ['CookingRec']
for var in variables:
    print hog2[var].value_counts()
    plt.scatter(hog2[var], hog2.TotalHouseHoldIncome)
    plt.show()









    



3.0    2109
2.0     552
1.0      14
Name: CookingRec, dtype: int64



In [74]:

    
hog2['WaterRec'] = (hog2.Water == 1).astype(int)

hog2['OwnershipRec'] = ((hog2.Ownership == 1) | (hog2.Ownership == 3)).astype(int)
hog2['Hacinamiento'] = hog2.HouseMembers * 1.0 / hog2.SleepingRooms
hog2['id'] =  (hog2.CODUSU.astype(str) + hog2.NRO_HOGAR.astype(str))

hog2['TotalHouseHoldIncome'].replace(to_replace=[0], value=[1] , inplace=True, axis=None)
hog2['lnHouseIncome'] = np.log(hog2['TotalHouseHoldIncome'])
sinCuartosParaDormir = (hog2.SleepingRooms == 0)



In [75]:

    
hogReducida = hog2.copy().drop(['CODUSU','NRO_HOGAR','REGION','HomeTypeesp','FloorMaterialesp',
                              'WaterTypeesp','Ownershipesp','CookingCombustibleesp','DomesticService1', 
                               'DomesticService2', 'DomesticService3','DomesticService4', 'DomesticService5',
                               'DomesticService6'],axis = 1)

LOAD INDIVIDUAL DATA



In [76]:

    
# Load data previously exported

ind = pd.read_csv('data/pivotInd.csv')
ind['id'] = ind['id'].astype(str)

ind.drop(['AGLO2'],axis =1,inplace=True)
ind['sumPredicted'] = ind.headPredictedLnIncome + ind.spousePredictedLnIncome

MERGE INDIVIDUAL WITH HOUSEHOLD



In [77]:

    
hogReducida = hogReducida.copy().loc[~sinCuartosParaDormir,:]



In [78]:

    
# check before merge

print 'filas hog:',hogReducida.shape[0]
print 'filas ind:',ind.shape[0]
print 'cantidad de ind en hog:', sum(ind['id'].sort_values().isin(hogReducida['id'].sort_values()))









    



filas hog: 2675
filas ind: 2703
cantidad de ind en hog: 2675



In [79]:

    
dataUnida = pd.merge(left=hogReducida, right=ind, on='id',how='left')



In [80]:

    
dataUnida.to_csv('data/dataFinalParaModelo.csv',index=False)



In [ ]:

	CODUSU	NRO_HOGAR	COMPONENTE	predictedLnIncome
2	307861	1	1	7.478150
3	307861	1	2	7.991038
7	308278	1	1	8.485189
8	308278	1	2	7.750667
9	311937	1	1	8.336136

	id	AGLOMERADO	familyRelation	age	age2	female	education	education2	primary	secondary	...	P47T	lnIncome	lnIncomeT	predictedLnIncome	job	DECCFR	DECIFR	maritalStatus	reading	placeOfBirth
0	3024681	32	1	20	400	1	13.0	169.0	7.0	5.0	...	2000	0.000000	7.600902	0.000000	0	8	6	5	1	1
2	3078611	32	1	42	1764	0	7.0	49.0	7.0	0.0	...	3000	8.006368	8.006368	7.478150	1	6	8	2	1	2
3	3078611	32	2	44	1936	1	17.0	289.0	7.0	5.0	...	2800	7.937375	7.937375	7.991038	1	6	8	2	1	2
6	3087621	32	1	68	4624	1	17.0	289.0	7.0	5.0	...	3200	0.000000	8.070906	0.000000	0	9	5	4	1	3
7	3082781	32	1	38	1444	0	17.0	289.0	7.0	5.0	...	6000	8.517193	8.699515	8.485189	1	10	10	2	1	1

	AGLOMERADO		age		age2		female		education		...	DECCFR		DECIFR		maritalStatus		reading		placeOfBirth
familyRelation	1	2	1	2	1	2	1	2	1	2	...	1	2	1	2	1	2	1	2	1	2
id
1250971	33.0	NaN	57.0	NaN	3249.0	NaN	1.0	NaN	14.0	NaN	...	8.0	NaN	9.0	NaN	4.0	NaN	1.0	NaN	1.0	NaN
1253451	32.0	NaN	66.0	NaN	4356.0	NaN	1.0	NaN	7.0	NaN	...	8.0	NaN	6.0	NaN	3.0	NaN	1.0	NaN	1.0	NaN
1254461	32.0	NaN	79.0	NaN	6241.0	NaN	1.0	NaN	12.0	NaN	...	7.0	NaN	2.0	NaN	4.0	NaN	1.0	NaN	1.0	NaN
1256691	33.0	NaN	61.0	NaN	3721.0	NaN	0.0	NaN	7.0	NaN	...	8.0	NaN	9.0	NaN	4.0	NaN	1.0	NaN	1.0	NaN
1256892	32.0	32.0	31.0	23.0	961.0	529.0	0.0	1.0	12.0	12.0	...	5.0	5.0	5.0	5.0	1.0	1.0	1.0	1.0	4.0	4.0

	schoolAndJob	noJob	job	id	AGLO1	AGLO2	headAge	spouseAge	headAge2	spouseAge2	...	headDECCFR	spouseDECCFR	headDECIFR	spouseDECIFR	headMaritalStatus	spouseMaritalStatus	headReading	spouseReading	headPlaceOfBirth	spouseplaceOfBirth
0	40.0	1	3	1250971	33.0	NaN	57.0	NaN	3249.0	NaN	...	8.0	NaN	9.0	NaN	4.0	NaN	1.0	NaN	1.0	NaN
1	12.0	1	1	1253451	32.0	NaN	66.0	NaN	4356.0	NaN	...	8.0	NaN	6.0	NaN	3.0	NaN	1.0	NaN	1.0	NaN
2	0.0	1	0	1254461	32.0	NaN	79.0	NaN	6241.0	NaN	...	7.0	NaN	2.0	NaN	4.0	NaN	1.0	NaN	1.0	NaN
3	14.0	2	2	1256691	33.0	NaN	61.0	NaN	3721.0	NaN	...	8.0	NaN	9.0	NaN	4.0	NaN	1.0	NaN	1.0	NaN
4	24.0	1	2	1256892	32.0	32.0	31.0	23.0	961.0	529.0	...	5.0	5.0	5.0	5.0	1.0	1.0	1.0	1.0	4.0	4.0

	CODUSU	NRO_HOGAR	REGION	PONDERA	HomeType	HomeTypeesp	RoomsNumber	FloorMaterial	FloorMaterialesp	RoofMaterial	...	DomesticService3	TotalFamilyIncome	TotalFamilyIncomeDec	TotalFamilyIncomeDecReg	PerCapInc	PerCapIncDec	PerCapIncDecReg
0	302468	1	1	1287	2	NaN	2	1	NaN	9	...	2	4000.0	6	6	2000.0	8	8
1	307861	1	1	1674	2	NaN	2	1	NaN	1	...	98	5800.0	8	8	1450.0	6	6
2	308762	1	1	1522	2	NaN	4	1	NaN	9	...	98	3200.0	5	5	3200.0	9	9
3	308278	1	1	1320	2	NaN	3	1	NaN	9	...	96	10000.0	10	10	5000.0	10	10
4	311937	1	1	1281	2	NaN	4	1	NaN	1	...	96	11000.0	10	10	2750.0	9	9