Estimate a regression using the Income data
We'll be working with a dataset from US Census indome (data dictionary).
Many businesses would like to personalize their offer based on customer’s income. High-income customers could be, for instance, exposed to premium products. As a customer’s income is not always explicitly known, predictive model could estimate income of a person based on other information.
Our goal is to create a predictive model that will be able to output an estimation of a person income.
In [2]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
# read the data and set the datetime as the index
income = pd.read_csv('https://github.com/albahnsen/PracticalMachineLearningClass/raw/master/datasets/income.csv.zip', index_col=0)
income.head()
Out[2]:
In [3]:
income.shape
Out[3]:
In [4]:
income.plot(x='Age', y='Income', kind='scatter')
Out[4]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
losing/attrition of the customers from the company. Especially, the industries that the user acquisition is costly, it is crucially important for one company to reduce and ideally make the customer churn to 0 to sustain their recurring revenue. If you consider customer retention is always cheaper than customer acquisition and generally depends on the data of the user(usage of the service or product), it poses a great/exciting/hard problem for machine learning.
Dataset is from a telecom service provider where they have the service usage(international plan, voicemail plan, usage in daytime, usage in evenings and nights and so on) and basic demographic information(state and area code) of the user. For labels, I have a single data point whether the customer is churned out or not.
In [6]:
# Download the dataset
data = pd.read_csv('https://github.com/ghuiber/churn/raw/master/data/churn.csv')
In [7]:
data.head()
Out[7]:
In [ ]:
In [ ]:
In [ ]: