default of credit card clients Data Set

The assignment was as follows:

In the workshop for this week, you are to select a data set from the UCI Machine Learning Repository and based on the recommended analysis type, wrangle the data into a fitted model, showing some model evaluation. In particular:

Layout the data into a dataset X and targets y.
Choose regression, classification, or clustering and build the best model you can from it.
Report an evaluation of the model built
Visualize aspects of your model (optional)
Compare and contrast different model families

Credit Card Example

Downloaded from the UCI Machine Learning Repository on February 26, 2015. The first thing is to fully describe your data in a README file. The dataset description is as follows:

Data Set: Multivariate
Attribute: Integer, Real
Tasks: Classification,
Instances: 30000
Attributes: 24

Data Set Information:

This research aimed at the case of customersâ€™ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel â€œSorting Smoothing Methodâ€ to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.

Attribute Information:

This research employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This study reviewed the literature and used the following 23 variables as explanatory variables:

X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit.

X2: Gender (1 = male; 2 = female).

X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).

X4: Marital status (1 = married; 2 = single; 3 = others).

X5: Age (year).

X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.

X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005. X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.

Relevant Papers:

Data Exploration

Source:

Name: I-Cheng Yeh email addresses: (1) icyeh '@' chu.edu.tw (2) 140910 '@' mail.tku.edu.tw institutions: (1) Department of Information Management, Chung Hua University, Taiwan. (2) Department of Civil Engineering, Tamkang University, Taiwan. other contact information: 886-2-26215656 ext. 3181



In [1]:

    
%matplotlib inline

import os
import json
import time
import pickle
import requests


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split as tts
from sklearn import metrics
from sklearn import cross_validation
from sklearn.cross_validation import KFold

from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier









    



C:\Users\eobrien\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)



In [2]:

    
data = pd.read_csv('data/dataset.csv')



In [3]:

    
data.head()









    Out[3]:






  
    
      
      Unnamed: 0
      X1
      X2
      X3
      X4
      X5
      X6
      X7
      X8
      X9
      ...
      X15
      X16
      X17
      X18
      X19
      X20
      X21
      X22
      X23
      Y
    
  
  
    
      0
      ID
      LIMIT_BAL
      SEX
      EDUCATION
      MARRIAGE
      AGE
      PAY_0
      PAY_2
      PAY_3
      PAY_4
      ...
      BILL_AMT4
      BILL_AMT5
      BILL_AMT6
      PAY_AMT1
      PAY_AMT2
      PAY_AMT3
      PAY_AMT4
      PAY_AMT5
      PAY_AMT6
      default payment next month
    
    
      1
      1
      20000
      2
      2
      1
      24
      2
      2
      -1
      -1
      ...
      0
      0
      0
      0
      689
      0
      0
      0
      0
      1
    
    
      2
      2
      120000
      2
      2
      2
      26
      -1
      2
      0
      0
      ...
      3272
      3455
      3261
      0
      1000
      1000
      1000
      0
      2000
      1
    
    
      3
      3
      90000
      2
      2
      2
      34
      0
      0
      0
      0
      ...
      14331
      14948
      15549
      1518
      1500
      1000
      1000
      1000
      5000
      0
    
    
      4
      4
      50000
      2
      2
      1
      37
      0
      0
      0
      0
      ...
      28314
      28959
      29547
      2000
      2019
      1200
      1100
      1069
      1000
      0
    
  

5 rows × 25 columns



In [4]:

    
data.iloc[0]
df = data

new_header = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
df =df.rename(columns = new_header) #set the header row as the df header



In [5]:

    
#this forces all of the columns to become numeric

for c in df.columns:
        df[c] = pd.to_numeric(df[c], errors='coerce')



In [6]:

    
# Describe the dataset
print(df.describe())
df.info()









    



                 ID       LIMIT_BAL           SEX     EDUCATION      MARRIAGE  \
count  30000.000000    30000.000000  30000.000000  30000.000000  30000.000000   
mean   15000.500000   167484.322667      1.603733      1.853133      1.551867   
std     8660.398374   129747.661567      0.489129      0.790349      0.521970   
min        1.000000    10000.000000      1.000000      0.000000      0.000000   
25%     7500.750000    50000.000000      1.000000      1.000000      1.000000   
50%    15000.500000   140000.000000      2.000000      2.000000      2.000000   
75%    22500.250000   240000.000000      2.000000      2.000000      2.000000   
max    30000.000000  1000000.000000      2.000000      6.000000      3.000000   

                AGE         PAY_0         PAY_2         PAY_3         PAY_4  \
count  30000.000000  30000.000000  30000.000000  30000.000000  30000.000000   
mean      35.485500     -0.016700     -0.133767     -0.166200     -0.220667   
std        9.217904      1.123802      1.197186      1.196868      1.169139   
min       21.000000     -2.000000     -2.000000     -2.000000     -2.000000   
25%       28.000000     -1.000000     -1.000000     -1.000000     -1.000000   
50%       34.000000      0.000000      0.000000      0.000000      0.000000   
75%       41.000000      0.000000      0.000000      0.000000      0.000000   
max       79.000000      8.000000      8.000000      8.000000      8.000000   

                  ...                  BILL_AMT4      BILL_AMT5  \
count             ...               30000.000000   30000.000000   
mean              ...               43262.948967   40311.400967   
std               ...               64332.856134   60797.155770   
min               ...             -170000.000000  -81334.000000   
25%               ...                2326.750000    1763.000000   
50%               ...               19052.000000   18104.500000   
75%               ...               54506.000000   50190.500000   
max               ...              891586.000000  927171.000000   

           BILL_AMT6       PAY_AMT1      PAY_AMT2      PAY_AMT3  \
count   30000.000000   30000.000000  3.000000e+04   30000.00000   
mean    38871.760400    5663.580500  5.921163e+03    5225.68150   
std     59554.107537   16563.280354  2.304087e+04   17606.96147   
min   -339603.000000       0.000000  0.000000e+00       0.00000   
25%      1256.000000    1000.000000  8.330000e+02     390.00000   
50%     17071.000000    2100.000000  2.009000e+03    1800.00000   
75%     49198.250000    5006.000000  5.000000e+03    4505.00000   
max    961664.000000  873552.000000  1.684259e+06  896040.00000   

            PAY_AMT4       PAY_AMT5       PAY_AMT6  default payment next month  
count   30000.000000   30000.000000   30000.000000                30000.000000  
mean     4826.076867    4799.387633    5215.502567                    0.221200  
std     15666.159744   15278.305679   17777.465775                    0.415062  
min         0.000000       0.000000       0.000000                    0.000000  
25%       296.000000     252.500000     117.750000                    0.000000  
50%      1500.000000    1500.000000    1500.000000                    0.000000  
75%      4013.250000    4031.500000    4000.000000                    0.000000  
max    621000.000000  426529.000000  528666.000000                    1.000000  

[8 rows x 25 columns]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30000 entries, 1 to 30000
Data columns (total 25 columns):
ID                            30000 non-null int64
LIMIT_BAL                     30000 non-null int64
SEX                           30000 non-null int64
EDUCATION                     30000 non-null int64
MARRIAGE                      30000 non-null int64
AGE                           30000 non-null int64
PAY_0                         30000 non-null int64
PAY_2                         30000 non-null int64
PAY_3                         30000 non-null int64
PAY_4                         30000 non-null int64
PAY_5                         30000 non-null int64
PAY_6                         30000 non-null int64
BILL_AMT1                     30000 non-null int64
BILL_AMT2                     30000 non-null int64
BILL_AMT3                     30000 non-null int64
BILL_AMT4                     30000 non-null int64
BILL_AMT5                     30000 non-null int64
BILL_AMT6                     30000 non-null int64
PAY_AMT1                      30000 non-null int64
PAY_AMT2                      30000 non-null int64
PAY_AMT3                      30000 non-null int64
PAY_AMT4                      30000 non-null int64
PAY_AMT5                      30000 non-null int64
PAY_AMT6                      30000 non-null int64
default payment next month    30000 non-null int64
dtypes: int64(25)
memory usage: 5.7 MB



In [7]:

    
#changing this so it is easier to work with

df.columns =['ID', 'LIMIT_BAL', 'SEX', 'EDUCATION', 'MARRIAGE', 'AGE', 'PAY_0',
       'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1', 'BILL_AMT2',
       'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1',
       'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6',
       'Default']



In [9]:

    
feature_cols = ['ID', 'LIMIT_BAL', 'SEX', 'EDUCATION', 'MARRIAGE', 'AGE', 'PAY_0',
       'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1', 'BILL_AMT2',
       'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1',
       'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6']



In [11]:

    
X = df[feature_cols]



In [12]:

    
y = df.Default



In [ ]:



In [ ]:



In [13]:

    
estimator = RandomForestClassifier()



In [17]:

    
X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2, random_state=0)



In [18]:

    
for train, test in KFold(len(X), n_folds=12, shuffle=True):
        X_train, X_test = X.iloc[train], X.iloc[test]
        y_train, y_test = y.iloc[train], y.iloc[test]   
        estimator



In [19]:

    
df.iloc[test]









    Out[19]:






  
    
      
      ID
      LIMIT_BAL
      SEX
      EDUCATION
      MARRIAGE
      AGE
      PAY_0
      PAY_2
      PAY_3
      PAY_4
      ...
      BILL_AMT4
      BILL_AMT5
      BILL_AMT6
      PAY_AMT1
      PAY_AMT2
      PAY_AMT3
      PAY_AMT4
      PAY_AMT5
      PAY_AMT6
      Default
    
  
  
    
      30
      30
      50000
      1
      1
      2
      26
      0
      0
      0
      0
      ...
      17907
      18375
      11400
      1500
      1500
      1000
      1000
      1600
      0
      0
    
    
      38
      38
      60000
      2
      2
      2
      22
      0
      0
      0
      0
      ...
      6026
      -28335
      18660
      1500
      1518
      2043
      0
      47671
      617
      0
    
    
      41
      41
      360000
      1
      1
      2
      33
      0
      0
      0
      0
      ...
      628699
      195969
      179224
      10000
      7000
      6000
      188840
      28000
      4000
      0
    
    
      43
      43
      10000
      1
      2
      2
      22
      0
      0
      0
      0
      ...
      3576
      3670
      4451
      1500
      2927
      1000
      300
      1000
      500
      0
    
    
      45
      45
      40000
      2
      1
      2
      30
      0
      0
      0
      2
      ...
      25209
      26636
      29197
      3000
      5000
      0
      2000
      3000
      0
      0
    
    
      101
      101
      140000
      1
      1
      2
      32
      -2
      -2
      -2
      -2
      ...
      415
      100
      1430
      10212
      850
      415
      100
      1430
      0
      0
    
    
      113
      113
      280000
      1
      2
      1
      41
      2
      2
      2
      2
      ...
      144401
      152174
      149415
      6500
      0
      14254
      14850
      0
      5000
      0
    
    
      123
      123
      110000
      2
      1
      1
      48
      1
      -2
      -2
      -2
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      126
      126
      20000
      1
      2
      2
      23
      1
      -2
      -2
      -2
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      130
      130
      60000
      1
      3
      1
      55
      3
      2
      2
      0
      ...
      28853
      29510
      26547
      2504
      7
      1200
      1200
      1100
      1500
      0
    
    
      133
      133
      420000
      1
      2
      1
      34
      0
      0
      0
      0
      ...
      220951
      210606
      188108
      9744
      9553
      7603
      7830
      7253
      11326
      0
    
    
      134
      134
      330000
      1
      3
      1
      46
      0
      0
      0
      0
      ...
      227587
      227775
      228203
      8210
      8095
      8025
      8175
      8391
      8200
      0
    
    
      141
      141
      240000
      1
      1
      2
      47
      1
      -2
      -2
      -2
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      143
      143
      50000
      1
      2
      2
      23
      1
      2
      2
      2
      ...
      19996
      19879
      18065
      1000
      10000
      400
      700
      800
      600
      0
    
    
      150
      150
      260000
      2
      1
      1
      60
      1
      -2
      -1
      -1
      ...
      0
      969
      869
      0
      22500
      0
      969
      1000
      0
      0
    
    
      152
      152
      80000
      1
      1
      2
      25
      0
      0
      0
      0
      ...
      41087
      41951
      31826
      30000
      3000
      6000
      8000
      2000
      14000
      0
    
    
      154
      154
      280000
      2
      2
      1
      56
      0
      0
      0
      0
      ...
      101783
      177145
      169311
      8042
      6700
      5137
      100000
      7000
      6321
      0
    
    
      175
      175
      360000
      1
      1
      2
      29
      1
      -2
      -1
      -1
      ...
      0
      0
      0
      0
      77
      0
      0
      0
      0
      0
    
    
      192
      192
      60000
      2
      1
      2
      27
      2
      0
      0
      0
      ...
      23005
      22499
      22873
      1342
      1664
      2000
      0
      900
      846
      1
    
    
      194
      194
      180000
      2
      1
      2
      24
      -1
      -1
      2
      0
      ...
      10200
      0
      0
      37867
      0
      200
      0
      0
      0
      0
    
    
      197
      197
      150000
      2
      2
      1
      34
      -2
      -2
      -2
      -2
      ...
      116
      0
      1500
      0
      0
      116
      0
      1500
      0
      0
    
    
      237
      237
      150000
      2
      2
      2
      27
      0
      0
      0
      0
      ...
      44384
      36900
      29497
      4500
      1745
      1566
      1208
      1077
      2529
      0
    
    
      248
      248
      100000
      2
      2
      2
      27
      0
      0
      0
      0
      ...
      30337
      30997
      32904
      1788
      1799
      1100
      1150
      2423
      0
      0
    
    
      260
      260
      220000
      1
      1
      1
      48
      2
      0
      0
      0
      ...
      169115
      172169
      162402
      10000
      9020
      6000
      5500
      6000
      5500
      1
    
    
      279
      279
      50000
      1
      3
      1
      33
      0
      0
      0
      0
      ...
      20046
      20067
      19703
      2007
      2199
      691
      707
      703
      697
      0
    
    
      292
      292
      50000
      2
      2
      2
      22
      1
      -2
      -2
      -2
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      304
      304
      20000
      2
      1
      2
      25
      0
      0
      2
      0
      ...
      15824
      15761
      12510
      2800
      0
      4000
      1700
      1000
      2000
      1
    
    
      307
      307
      500000
      2
      2
      1
      36
      -1
      -1
      -1
      -1
      ...
      8500
      4590
      0
      23962
      0
      8500
      4590
      0
      0
      0
    
    
      349
      349
      140000
      2
      2
      2
      31
      0
      0
      0
      0
      ...
      139679
      141748
      142174
      6600
      6500
      5100
      5300
      6000
      5000
      0
    
    
      353
      353
      380000
      1
      1
      2
      30
      1
      -2
      -2
      -2
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      29690
      29690
      200000
      1
      1
      2
      36
      -1
      -1
      -1
      -1
      ...
      396
      396
      396
      396
      396
      396
      396
      396
      396
      0
    
    
      29707
      29707
      420000
      1
      2
      1
      43
      0
      0
      0
      0
      ...
      30099
      66049
      61043
      42063
      30253
      20016
      40015
      30004
      3000
      0
    
    
      29722
      29722
      230000
      1
      2
      1
      35
      0
      0
      0
      0
      ...
      180117
      179717
      124117
      7004
      10000
      5000
      0
      5000
      189600
      0
    
    
      29729
      29729
      160000
      1
      1
      1
      46
      -2
      -2
      -2
      -2
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      29798
      29798
      90000
      1
      2
      1
      31
      0
      0
      0
      0
      ...
      72021
      73912
      58301
      2471
      2600
      2633
      3108
      2052
      1858
      0
    
    
      29800
      29800
      20000
      1
      2
      2
      33
      3
      2
      8
      7
      ...
      21631
      21026
      20130
      8000
      0
      0
      0
      0
      0
      1
    
    
      29806
      29806
      60000
      1
      2
      1
      40
      0
      0
      0
      0
      ...
      52642
      52806
      53530
      2500
      2500
      2500
      2000
      2100
      2200
      0
    
    
      29810
      29810
      20000
      1
      3
      1
      48
      0
      0
      3
      2
      ...
      18188
      17630
      16780
      5000
      0
      4000
      0
      0
      0
      1
    
    
      29817
      29817
      80000
      1
      2
      2
      42
      1
      4
      3
      2
      ...
      81545
      51338
      50826
      0
      639
      0
      50918
      2000
      2000
      0
    
    
      29818
      29818
      50000
      1
      2
      2
      40
      2
      0
      0
      2
      ...
      13444
      13367
      13282
      2000
      2000
      1000
      1000
      1000
      2000
      1
    
    
      29847
      29847
      300000
      1
      2
      2
      47
      0
      0
      0
      0
      ...
      118522
      118586
      118179
      3000
      5000
      5013
      4514
      4220
      4008
      0
    
    
      29858
      29858
      50000
      1
      2
      1
      47
      0
      0
      0
      0
      ...
      14439
      14602
      14918
      1504
      1600
      1537
      700
      700
      600
      0
    
    
      29866
      29866
      150000
      1
      1
      1
      43
      -1
      3
      2
      -1
      ...
      416
      416
      416
      0
      416
      416
      416
      416
      0
      1
    
    
      29872
      29872
      420000
      1
      2
      2
      31
      0
      0
      0
      0
      ...
      293951
      305011
      312087
      14302
      10500
      11000
      16000
      12000
      16000
      0
    
    
      29881
      29881
      140000
      1
      2
      1
      38
      0
      0
      0
      0
      ...
      75815
      65099
      66445
      4494
      5303
      2806
      1825
      1880
      1901
      1
    
    
      29890
      29890
      340000
      1
      2
      1
      37
      0
      0
      0
      0
      ...
      81589
      83546
      85362
      8000
      4000
      3000
      3300
      3300
      4000
      0
    
    
      29904
      29904
      260000
      1
      1
      1
      30
      -1
      0
      -1
      -1
      ...
      99
      99
      172104
      10018
      13333
      99
      99
      172104
      30013
      0
    
    
      29905
      29905
      60000
      1
      3
      2
      30
      0
      0
      0
      0
      ...
      58732
      59306
      59728
      2600
      4553
      5800
      2000
      1000
      1462
      1
    
    
      29909
      29909
      140000
      1
      1
      2
      29
      -2
      -2
      -2
      -2
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      29923
      29923
      150000
      1
      2
      1
      35
      1
      -2
      -2
      -2
      ...
      -18
      -18
      -18
      0
      0
      0
      0
      0
      0
      0
    
    
      29931
      29931
      20000
      1
      2
      1
      43
      1
      2
      2
      2
      ...
      8102
      7136
      5243
      1307
      0
      1400
      0
      182
      400
      0
    
    
      29939
      29939
      30000
      1
      2
      2
      38
      2
      2
      2
      2
      ...
      21471
      21598
      23241
      1700
      1700
      0
      782
      2000
      1100
      0
    
    
      29941
      29941
      20000
      1
      2
      1
      48
      1
      2
      0
      0
      ...
      18928
      18761
      18787
      2
      2019
      11309
      2000
      700
      1002
      0
    
    
      29945
      29945
      80000
      1
      2
      2
      39
      3
      2
      0
      0
      ...
      23867
      23130
      0
      0
      2968
      1867
      463
      0
      0
      1
    
    
      29953
      29953
      210000
      1
      2
      2
      30
      -1
      -1
      -1
      -1
      ...
      1252
      626
      626
      3980
      7509
      1252
      0
      626
      626
      0
    
    
      29962
      29962
      260000
      1
      1
      2
      33
      -2
      -2
      -2
      -2
      ...
      1368
      101
      955
      263
      0
      1368
      101
      955
      0
      0
    
    
      29969
      29969
      20000
      1
      2
      2
      34
      0
      0
      0
      0
      ...
      13478
      16978
      12914
      2000
      2000
      1000
      5000
      12914
      600
      0
    
    
      29978
      29978
      420000
      1
      1
      2
      34
      0
      0
      0
      0
      ...
      141695
      144839
      147954
      7000
      7000
      5500
      5500
      5600
      5000
      0
    
    
      29983
      29983
      90000
      1
      2
      1
      36
      0
      0
      0
      0
      ...
      11328
      12036
      14329
      1500
      1500
      1500
      1200
      2500
      0
      1
    
    
      29996
      29996
      220000
      1
      3
      1
      39
      0
      0
      0
      0
      ...
      88004
      31237
      15980
      8500
      20000
      5003
      3047
      5000
      1000
      0
    
  

2500 rows × 25 columns



In [20]:

    
label = "default data credit card random forest"
start  = time.time() # Start the clock! 
scores = {'precision':[], 'recall':[], 'accuracy':[], 'f1':[]}
    
for train, test in KFold(len(X), n_folds=12, shuffle=True):
    X_train, X_test = X.iloc[train], X.iloc[test]
    y_train, y_test = y.iloc[train], y.iloc[test] 
        
estimator = RandomForestClassifier()
estimator.fit(X_train, y_train)
        
expected  = y_test
predicted = estimator.predict(X_test)
        
# Append our scores to the tracker
scores['precision'].append(metrics.precision_score(expected, predicted, average="weighted"))
scores['recall'].append(metrics.recall_score(expected, predicted, average="weighted"))
scores['accuracy'].append(metrics.accuracy_score(expected, predicted))
scores['f1'].append(metrics.f1_score(expected, predicted, average="weighted"))

# Report
print("Build and Validation of {} took {:0.3f} seconds".format(label, time.time()-start))
print("Validation scores are as follows:\n")
print(pd.DataFrame(scores).mean())
    
# Write official estimator to disk
estimator = RandomForestClassifier()
estimator.fit(X, y)
    
outpath = label.lower().replace(" ", "-") + ".pickle"
with open(outpath, 'wb') as f:
    pickle.dump(estimator, f)

#print("\nFitted model written to:\n{}".format(os.path.abspath(outpath)))









    



Build and Validation of default data credit card random forest took 1.272 seconds
Validation scores are as follows:

accuracy     0.801200
f1           0.772397
precision    0.777112
recall       0.801200
dtype: float64



In [ ]:

    
label = "default data credit card SVC"
start  = time.time() # Start the clock! 
scores = {'precision':[], 'recall':[], 'accuracy':[], 'f1':[]}
    
for train, test in KFold(len(X), n_folds=12, shuffle=True):
    X_train, X_test = X.iloc[train], X.iloc[test]
    y_train, y_test = y.iloc[train], y.iloc[test] 
        
estimator = SVC()
estimator.fit(X_train, y_train)
        
expected  = y_test
predicted = estimator.predict(X_test)
        
# Append our scores to the tracker
scores['precision'].append(metrics.precision_score(expected, predicted, average="weighted"))
scores['recall'].append(metrics.recall_score(expected, predicted, average="weighted"))
scores['accuracy'].append(metrics.accuracy_score(expected, predicted))
scores['f1'].append(metrics.f1_score(expected, predicted, average="weighted"))

# Report
print("Build and Validation of {} took {:0.3f} seconds".format(label, time.time()-start))
print("Validation scores are as follows:\n")
print(pd.DataFrame(scores).mean())
    
    
    
#outpath = label.lower().replace(" ", "-") + ".pickle"
#with open(outpath, 'wb') as f:
#    pickle.dump(estimator, f)

#print("\nFitted model written to:\n{}".format(os.path.abspath(outpath)))



In [21]:

    
label = "default data credit card KNNclassifer "
start  = time.time() # Start the clock! 
scores = {'precision':[], 'recall':[], 'accuracy':[], 'f1':[]}
    
for train, test in KFold(len(X), n_folds=12, shuffle=True):
    X_train, X_test = X.iloc[train], X.iloc[test]
    y_train, y_test = y.iloc[train], y.iloc[test] 
        
estimator = KNeighborsClassifier(n_neighbors=12)
estimator.fit(X_train, y_train)
        
expected  = y_test
predicted = estimator.predict(X_test)
        
# Append our scores to the tracker
scores['precision'].append(metrics.precision_score(expected, predicted, average="weighted"))
scores['recall'].append(metrics.recall_score(expected, predicted, average="weighted"))
scores['accuracy'].append(metrics.accuracy_score(expected, predicted))
scores['f1'].append(metrics.f1_score(expected, predicted, average="weighted"))

# Report
print("Build and Validation of {} took {:0.3f} seconds".format(label, time.time()-start))
print("Validation scores are as follows:\n")
print(pd.DataFrame(scores).mean())
    
    
    
#outpath = label.lower().replace(" ", "-") + ".pickle"
#with open(outpath, 'wb') as f:
#    pickle.dump(estimator, f)

#print("\nFitted model written to:\n{}".format(os.path.abspath(outpath)))









    



Build and Validation of default data credit card KNNclassifer  took 1.197 seconds
Validation scores are as follows:

accuracy     0.776800
f1           0.714357
precision    0.710574
recall       0.776800
dtype: float64



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:

	Unnamed: 0	X1	X2	X3	X4	X5	X6	X7	X8	X9	...	X15	X16	X17	X18	X19	X20	X21	X22	X23	Y
0	ID	LIMIT_BAL	SEX	EDUCATION	MARRIAGE	AGE	PAY_0	PAY_2	PAY_3	PAY_4	...	BILL_AMT4	BILL_AMT5	BILL_AMT6	PAY_AMT1	PAY_AMT2	PAY_AMT3	PAY_AMT4	PAY_AMT5	PAY_AMT6	default payment next month
1	1	20000	2	2	1	24	2	2	-1	-1	...	0	0	0	0	689	0	0	0	0	1
2	2	120000	2	2	2	26	-1	2	0	0	...	3272	3455	3261	0	1000	1000	1000	0	2000	1
3	3	90000	2	2	2	34	0	0	0	0	...	14331	14948	15549	1518	1500	1000	1000	1000	5000	0
4	4	50000	2	2	1	37	0	0	0	0	...	28314	28959	29547	2000	2019	1200	1100	1069	1000	0

	ID	LIMIT_BAL	SEX	EDUCATION	MARRIAGE	AGE	PAY_0	PAY_2	PAY_3	PAY_4	...	BILL_AMT4	BILL_AMT5	BILL_AMT6	PAY_AMT1	PAY_AMT2	PAY_AMT3	PAY_AMT4	PAY_AMT5	PAY_AMT6	Default
30	30	50000	1	1	2	26	0	0	0	0	...	17907	18375	11400	1500	1500	1000	1000	1600	0	0
38	38	60000	2	2	2	22	0	0	0	0	...	6026	-28335	18660	1500	1518	2043	0	47671	617	0
41	41	360000	1	1	2	33	0	0	0	0	...	628699	195969	179224	10000	7000	6000	188840	28000	4000	0
43	43	10000	1	2	2	22	0	0	0	0	...	3576	3670	4451	1500	2927	1000	300	1000	500	0
45	45	40000	2	1	2	30	0	0	0	2	...	25209	26636	29197	3000	5000	0	2000	3000	0	0
101	101	140000	1	1	2	32	-2	-2	-2	-2	...	415	100	1430	10212	850	415	100	1430	0	0
113	113	280000	1	2	1	41	2	2	2	2	...	144401	152174	149415	6500	0	14254	14850	0	5000	0
123	123	110000	2	1	1	48	1	-2	-2	-2	...	0	0	0	0	0	0	0	0	0	0
126	126	20000	1	2	2	23	1	-2	-2	-2	...	0	0	0	0	0	0	0	0	0	0
130	130	60000	1	3	1	55	3	2	2	0	...	28853	29510	26547	2504	7	1200	1200	1100	1500	0
133	133	420000	1	2	1	34	0	0	0	0	...	220951	210606	188108	9744	9553	7603	7830	7253	11326	0
134	134	330000	1	3	1	46	0	0	0	0	...	227587	227775	228203	8210	8095	8025	8175	8391	8200	0
141	141	240000	1	1	2	47	1	-2	-2	-2	...	0	0	0	0	0	0	0	0	0	1
143	143	50000	1	2	2	23	1	2	2	2	...	19996	19879	18065	1000	10000	400	700	800	600	0
150	150	260000	2	1	1	60	1	-2	-1	-1	...	0	969	869	0	22500	0	969	1000	0	0
152	152	80000	1	1	2	25	0	0	0	0	...	41087	41951	31826	30000	3000	6000	8000	2000	14000	0
154	154	280000	2	2	1	56	0	0	0	0	...	101783	177145	169311	8042	6700	5137	100000	7000	6321	0
175	175	360000	1	1	2	29	1	-2	-1	-1	...	0	0	0	0	77	0	0	0	0	0
192	192	60000	2	1	2	27	2	0	0	0	...	23005	22499	22873	1342	1664	2000	0	900	846	1
194	194	180000	2	1	2	24	-1	-1	2	0	...	10200	0	0	37867	0	200	0	0	0	0
197	197	150000	2	2	1	34	-2	-2	-2	-2	...	116	0	1500	0	0	116	0	1500	0	0
237	237	150000	2	2	2	27	0	0	0	0	...	44384	36900	29497	4500	1745	1566	1208	1077	2529	0
248	248	100000	2	2	2	27	0	0	0	0	...	30337	30997	32904	1788	1799	1100	1150	2423	0	0
260	260	220000	1	1	1	48	2	0	0	0	...	169115	172169	162402	10000	9020	6000	5500	6000	5500	1
279	279	50000	1	3	1	33	0	0	0	0	...	20046	20067	19703	2007	2199	691	707	703	697	0
292	292	50000	2	2	2	22	1	-2	-2	-2	...	0	0	0	0	0	0	0	0	0	1
304	304	20000	2	1	2	25	0	0	2	0	...	15824	15761	12510	2800	0	4000	1700	1000	2000	1
307	307	500000	2	2	1	36	-1	-1	-1	-1	...	8500	4590	0	23962	0	8500	4590	0	0	0
349	349	140000	2	2	2	31	0	0	0	0	...	139679	141748	142174	6600	6500	5100	5300	6000	5000	0
353	353	380000	1	1	2	30	1	-2	-2	-2	...	0	0	0	0	0	0	0	0	0	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
29690	29690	200000	1	1	2	36	-1	-1	-1	-1	...	396	396	396	396	396	396	396	396	396	0
29707	29707	420000	1	2	1	43	0	0	0	0	...	30099	66049	61043	42063	30253	20016	40015	30004	3000	0
29722	29722	230000	1	2	1	35	0	0	0	0	...	180117	179717	124117	7004	10000	5000	0	5000	189600	0
29729	29729	160000	1	1	1	46	-2	-2	-2	-2	...	0	0	0	0	0	0	0	0	0	1
29798	29798	90000	1	2	1	31	0	0	0	0	...	72021	73912	58301	2471	2600	2633	3108	2052	1858	0
29800	29800	20000	1	2	2	33	3	2	8	7	...	21631	21026	20130	8000	0	0	0	0	0	1
29806	29806	60000	1	2	1	40	0	0	0	0	...	52642	52806	53530	2500	2500	2500	2000	2100	2200	0
29810	29810	20000	1	3	1	48	0	0	3	2	...	18188	17630	16780	5000	0	4000	0	0	0	1
29817	29817	80000	1	2	2	42	1	4	3	2	...	81545	51338	50826	0	639	0	50918	2000	2000	0
29818	29818	50000	1	2	2	40	2	0	0	2	...	13444	13367	13282	2000	2000	1000	1000	1000	2000	1
29847	29847	300000	1	2	2	47	0	0	0	0	...	118522	118586	118179	3000	5000	5013	4514	4220	4008	0
29858	29858	50000	1	2	1	47	0	0	0	0	...	14439	14602	14918	1504	1600	1537	700	700	600	0
29866	29866	150000	1	1	1	43	-1	3	2	-1	...	416	416	416	0	416	416	416	416	0	1
29872	29872	420000	1	2	2	31	0	0	0	0	...	293951	305011	312087	14302	10500	11000	16000	12000	16000	0
29881	29881	140000	1	2	1	38	0	0	0	0	...	75815	65099	66445	4494	5303	2806	1825	1880	1901	1
29890	29890	340000	1	2	1	37	0	0	0	0	...	81589	83546	85362	8000	4000	3000	3300	3300	4000	0
29904	29904	260000	1	1	1	30	-1	0	-1	-1	...	99	99	172104	10018	13333	99	99	172104	30013	0
29905	29905	60000	1	3	2	30	0	0	0	0	...	58732	59306	59728	2600	4553	5800	2000	1000	1462	1
29909	29909	140000	1	1	2	29	-2	-2	-2	-2	...	0	0	0	0	0	0	0	0	0	1
29923	29923	150000	1	2	1	35	1	-2	-2	-2	...	-18	-18	-18	0	0	0	0	0	0	0
29931	29931	20000	1	2	1	43	1	2	2	2	...	8102	7136	5243	1307	0	1400	0	182	400	0
29939	29939	30000	1	2	2	38	2	2	2	2	...	21471	21598	23241	1700	1700	0	782	2000	1100	0
29941	29941	20000	1	2	1	48	1	2	0	0	...	18928	18761	18787	2	2019	11309	2000	700	1002	0
29945	29945	80000	1	2	2	39	3	2	0	0	...	23867	23130	0	0	2968	1867	463	0	0	1
29953	29953	210000	1	2	2	30	-1	-1	-1	-1	...	1252	626	626	3980	7509	1252	0	626	626	0
29962	29962	260000	1	1	2	33	-2	-2	-2	-2	...	1368	101	955	263	0	1368	101	955	0	0
29969	29969	20000	1	2	2	34	0	0	0	0	...	13478	16978	12914	2000	2000	1000	5000	12914	600	0
29978	29978	420000	1	1	2	34	0	0	0	0	...	141695	144839	147954	7000	7000	5500	5500	5600	5000	0
29983	29983	90000	1	2	1	36	0	0	0	0	...	11328	12036	14329	1500	1500	1500	1200	2500	0	1
29996	29996	220000	1	3	1	39	0	0	0	0	...	88004	31237	15980	8500	20000	5003	3047	5000	1000	0