We'll be working with a dataset from US Census indome (data dictionary).
Many businesses would like to personalize their offer based on customer’s income. High-income customers could be, for instance, exposed to premium products. As a customer’s income is not always explicitly known, predictive model could estimate income of a person based on other information.
Our goal is to create a predictive model that will be able to output an estimation of a person income.
In [1]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
# read the data and set the datetime as the index
import zipfile
with zipfile.ZipFile('../datasets/income.csv.zip', 'r') as z:
f = z.open('income.csv')
income = pd.read_csv(f, index_col=0)
income.head()
Out[1]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: