Exercise - Logistic Regression

The client bank XYZ is running a direct marketing (phone calls) campaign. The classification goal is to predict if the client will subscribe a term deposit or not.

The data is obtained from UCI Machine Learning repository

Attribute Information:

bank client data:

  • age: (numeric)
  • job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
  • marital : marital status (categorical: 'divorced', 'married', 'single', 'unknown'; note: 'divorced' means divorced or widowed)
  • education (categorical: 'basic.4y', 'basic.6y', 'basic.9y','high.school', 'illiterate' ,'professional.course', 'university.degree', 'unknown')
  • default: has credit in default? (categorical: 'no','yes','unknown')
  • **housing: has housing loan? (categorical: 'no','yes','unknown')
  • loan: has personal loan? (categorical: 'no','yes','unknown')
  • contact: contact communication type (categorical: 'cellular', 'telephone')
  • month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
  • day: last contact day of the month (numerical: 1, 2, 3, 4, ...)
  • duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.

other attributes:

  • campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
  • pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
  • previous: number of contacts performed before this campaign and for this client (numeric)
  • poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')

Output variable (desired target):

  • deposit - has the client subscribed a term deposit? (binary: 'yes','no')

In [1]:
#1. import libraries

In [2]:
#2. import data

In [4]:
#3. Display first few records

In [3]:
#4. check column names

In [5]:
#5. do label encoding

In [6]:
#6. do some exploratory data analysis

In [7]:
#7. Build logistic regression model

In [8]:
#8. Build L2 logistic regression model

In [9]:
#9. Build L1 logistic regression model

In [10]:
#10. Find generalization error. Use 80/20 split

In [11]:
#11. report mis-classification rate

In [ ]: