Exploration of Prudential Life Insurance Data

Data retrieved from:

https://www.kaggle.com/c/prudential-life-insurance-assessment

File descriptions:

train.csv - the training set, contains the Response values
test.csv - the test set, you must predict the Response variable for all rows in this file
sample_submission.csv - a sample submission file in the correct format

Data fields:

Variable	Description
Id	A unique identifier associated with an application.
Product_Info_1-7	A set of normalized variables relating to the product applied for
Ins_Age	Normalized age of applicant
Ht	Normalized height of applicant
Wt	Normalized weight of applicant
BMI	Normalized BMI of applicant
Employment_Info_1-6	A set of normalized variables relating to the employment history of the applicant.
InsuredInfo_1-6	A set of normalized variables providing information about the applicant.
Insurance_History_1-9	A set of normalized variables relating to the insurance history of the applicant.
Family_Hist_1-5	A set of normalized variables relating to the family history of the applicant.
Medical_History_1-41	A set of normalized variables relating to the medical history of the applicant.
Medical_Keyword_1-48	A set of dummy variables relating to the presence of/absence of a medical keyword being associated with the application.
Response	This is the target variable, an ordinal variable relating to the final decision associated with an application

The following variables are all categorical (nominal):

Product_Info_1, Product_Info_2, Product_Info_3, Product_Info_5, Product_Info_6, Product_Info_7, Employment_Info_2, Employment_Info_3, Employment_Info_5, InsuredInfo_1, InsuredInfo_2, InsuredInfo_3, InsuredInfo_4, InsuredInfo_5, InsuredInfo_6, InsuredInfo_7, Insurance_History_1, Insurance_History_2, Insurance_History_3, Insurance_History_4, Insurance_History_7, Insurance_History_8, Insurance_History_9, Family_Hist_1, Medical_History_2, Medical_History_3, Medical_History_4, Medical_History_5, Medical_History_6, Medical_History_7, Medical_History_8, Medical_History_9, Medical_History_11, Medical_History_12, Medical_History_13, Medical_History_14, Medical_History_16, Medical_History_17, Medical_History_18, Medical_History_19, Medical_History_20, Medical_History_21, Medical_History_22, Medical_History_23, Medical_History_25, Medical_History_26, Medical_History_27, Medical_History_28, Medical_History_29, Medical_History_30, Medical_History_31, Medical_History_33, Medical_History_34, Medical_History_35, Medical_History_36, Medical_History_37, Medical_History_38, Medical_History_39, Medical_History_40, Medical_History_41

The following variables are continuous:

Product_Info_4, Ins_Age, Ht, Wt, BMI, Employment_Info_1, Employment_Info_4, Employment_Info_6, Insurance_History_5, Family_Hist_2, Family_Hist_3, Family_Hist_4, Family_Hist_5

The following variables are discrete:

Medical_History_1, Medical_History_10, Medical_History_15, Medical_History_24, Medical_History_32

Medical_Keyword_1-48 are dummy variables.

My thoughts are as follows:

The main dependent variable is the Risk Response (1-8) What are variables are correlated to the risk response? How do I perform correlation analysis between variables?

Import libraries



In [2]:

    
# Importing libraries

%pylab inline
%matplotlib inline
import pandas as pd 
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
from sklearn import preprocessing
import numpy as np









    



Populating the interactive namespace from numpy and matplotlib



In [3]:

    
# Convert variable data into categorical, continuous, discrete, 
# and dummy variable lists the following into a dictionary

Seperation of columns into categorical, continous and discrete



In [4]:

    
s = ["Product_Info_1, Product_Info_2, Product_Info_3, Product_Info_5, Product_Info_6, Product_Info_7, Employment_Info_2, Employment_Info_3, Employment_Info_5, InsuredInfo_1, InsuredInfo_2, InsuredInfo_3, InsuredInfo_4, InsuredInfo_5, InsuredInfo_6, InsuredInfo_7, Insurance_History_1, Insurance_History_2, Insurance_History_3, Insurance_History_4, Insurance_History_7, Insurance_History_8, Insurance_History_9, Family_Hist_1, Medical_History_2, Medical_History_3, Medical_History_4, Medical_History_5, Medical_History_6, Medical_History_7, Medical_History_8, Medical_History_9, Medical_History_11, Medical_History_12, Medical_History_13, Medical_History_14, Medical_History_16, Medical_History_17, Medical_History_18, Medical_History_19, Medical_History_20, Medical_History_21, Medical_History_22, Medical_History_23, Medical_History_25, Medical_History_26, Medical_History_27, Medical_History_28, Medical_History_29, Medical_History_30, Medical_History_31, Medical_History_33, Medical_History_34, Medical_History_35, Medical_History_36, Medical_History_37, Medical_History_38, Medical_History_39, Medical_History_40, Medical_History_41",
    "Product_Info_4, Ins_Age, Ht, Wt, BMI, Employment_Info_1, Employment_Info_4, Employment_Info_6, Insurance_History_5, Family_Hist_2, Family_Hist_3, Family_Hist_4, Family_Hist_5",
     "Medical_History_1, Medical_History_10, Medical_History_15, Medical_History_24, Medical_History_32"]
 

varTypes = dict()


varTypes['categorical'] = s[0].split(', ')


varTypes['continuous'] = s[1].split(', ')


varTypes['discrete'] = s[2].split(', ')



varTypes['dummy'] = ["Medical_Keyword_"+str(i) for i in range(1,49)]



In [5]:

    
#Prints out each of the the variable types as a check
#for i in iter(varTypes['dummy']):
    #print i

Importing life insurance data set



In [6]:

    
#Import training data 
d_raw = pd.read_csv('prud_files/train.csv')
d = d_raw.copy()



In [9]:

    
len(d.columns)









    Out[9]:





128

Pre-processing raw dataset for NaN values



In [181]:

    
# Get all the columns that have NaNs
d = d_raw.copy()
a = pd.isnull(d).sum()
nullColumns = a[a>0].index.values

#for c in nullColumns:
    #d[c].fillna(-1)

#Determine the min and max values for the NaN columns
a = pd.DataFrame(d, columns=nullColumns).describe()
a_min = a[3:4]
a_max = a[7:8]









    Out[181]:





array(['Employment_Info_1', 'Employment_Info_4', 'Employment_Info_6',
       'Insurance_History_5', 'Family_Hist_2', 'Family_Hist_3',
       'Family_Hist_4', 'Family_Hist_5', 'Medical_History_1',
       'Medical_History_10', 'Medical_History_15', 'Medical_History_24',
       'Medical_History_32'], dtype=object)



In [175]:

    
nullList = ['Family_Hist_4',
 'Medical_History_1',
 'Medical_History_10',
 'Medical_History_15',
 'Medical_History_24',
 'Medical_History_32']

pd.DataFrame(a_max, columns=nullList)









    Out[175]:






  
    
      
      Family_Hist_4
      Medical_History_1
      Medical_History_10
      Medical_History_15
      Medical_History_24
      Medical_History_32
    
  
  
    
      max
       0.943662
       240
       240
       240
       240
       240



In [303]:

    
# Convert all NaNs to -1 and sum up all medical keywords across columns

df = d.fillna(-1)
b = pd.DataFrame(df[varTypes["dummy"]].sum(axis=1), columns=["Medical_Keyword_Sum"])
df= pd.concat([df,b], axis=1, join='outer')



In [334]:









    Out[334]:






  
    
      
      Id
      Product_Info_1
      Product_Info_2
      Product_Info_3
      Product_Info_4
      Product_Info_5
      Product_Info_6
      Product_Info_7
      Ins_Age
      Ht
      ...
      Medical_Keyword_41
      Medical_Keyword_42
      Medical_Keyword_43
      Medical_Keyword_44
      Medical_Keyword_45
      Medical_Keyword_46
      Medical_Keyword_47
      Medical_Keyword_48
      Response
      Medical_Keyword_Sum
    
  
  
    
      0    
           2
       1
       D3
       10
       0.076923
       2
       1
       1
       0.641791
       0.581818
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       0
    
    
      1    
           5
       1
       A1
       26
       0.076923
       2
       3
       1
       0.059701
       0.600000
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       4
       0
    
    
      2    
           6
       1
       E1
       26
       0.076923
       2
       3
       1
       0.029851
       0.745455
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       0
    
    
      3    
           7
       1
       D4
       10
       0.487179
       2
       3
       1
       0.164179
       0.672727
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       1
    
    
      4    
           8
       1
       D2
       26
       0.230769
       2
       3
       1
       0.417910
       0.654545
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       0
    
    
      5    
          10
       1
       D2
       26
       0.230769
       3
       1
       1
       0.507463
       0.836364
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       2
    
    
      6    
          11
       1
       A8
       10
       0.166194
       2
       3
       1
       0.373134
       0.581818
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       0
    
    
      7    
          14
       1
       D2
       26
       0.076923
       2
       3
       1
       0.611940
       0.781818
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       1
       0
    
    
      8    
          15
       1
       D3
       26
       0.230769
       2
       3
       1
       0.522388
       0.618182
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       1
    
    
      9    
          16
       1
       E1
       21
       0.076923
       2
       3
       1
       0.552239
       0.600000
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       1
       2
    
    
      10   
          17
       1
       D3
       26
       0.128205
       2
       3
       1
       0.537313
       0.690909
      ...
       0
       0
       0
       1
       0
       0
       1
       1
       6
       4
    
    
      11   
          18
       1
       D4
       26
       0.230769
       2
       3
       1
       0.298507
       0.690909
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       2
       1
    
    
      12   
          19
       1
       A2
       26
       0.102564
       2
       3
       1
       0.567164
       0.618182
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       7
       1
    
    
      13   
          20
       2
       D1
       26
       0.487179
       2
       3
       1
       0.223881
       0.781818
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       3
       1
    
    
      14   
          22
       1
       D4
       26
       0.487179
       2
       3
       1
       0.328358
       0.636364
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       2
    
    
      15   
          23
       1
       A7
       26
       0.000000
       2
       3
       1
       0.626866
       0.672727
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       5
       3
    
    
      16   
          24
       2
       D4
       26
       0.487179
       2
       3
       1
       0.208955
       0.745455
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       1
    
    
      17   
          25
       1
       D3
       26
       0.384615
       2
       3
       1
       0.268657
       0.636364
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       7
       0
    
    
      18   
          26
       1
       D3
       26
       0.076923
       2
       3
       1
       0.388060
       0.781818
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       2
       1
    
    
      19   
          27
       1
       D4
       26
       0.487179
       2
       3
       1
       0.223881
       0.600000
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       0
    
    
      20   
          29
       1
       D2
       26
       0.435897
       2
       3
       1
       0.388060
       0.745455
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       0
    
    
      21   
          31
       1
       A1
       26
       1.000000
       2
       1
       1
       0.537313
       0.709091
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       5
       0
    
    
      22   
          32
       1
       D4
       26
       0.230769
       2
       3
       1
       0.179104
       0.800000
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       5
       0
    
    
      23   
          33
       1
       A2
       26
       0.179487
       2
       3
       1
       0.164179
       0.745455
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       0
    
    
      24   
          34
       1
       D1
       26
       0.487179
       2
       1
       1
       0.164179
       0.818182
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       6
       0
    
    
      25   
          35
       1
       A6
       26
       0.230769
       2
       3
       1
       0.268657
       0.781818
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       0
    
    
      26   
          37
       1
       A1
       26
       1.000000
       2
       3
       1
       0.507463
       0.654545
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       6
       1
    
    
      27   
          39
       1
       D3
       26
       0.230769
       2
       3
       1
       0.134328
       0.763636
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       0
    
    
      28   
          40
       1
       D4
       26
       0.487179
       2
       3
       1
       0.492537
       0.618182
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       7
       2
    
    
      29   
          41
       1
       D3
       26
       1.000000
       2
       3
       1
       0.582090
       0.654545
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       6
       2
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      59351
       79115
       1
       A7
       26
       0.000000
       2
       3
       1
       0.134328
       0.781818
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       5
       0
    
    
      59352
       79116
       1
       D2
       10
       0.230769
       2
       3
       1
       0.358209
       0.618182
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       6
       1
    
    
      59353
       79117
       1
       D4
       26
       0.589744
       2
       1
       1
       0.179104
       0.781818
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       6
       3
    
    
      59354
       79118
       1
       D2
       26
       0.487179
       2
       1
       1
       0.402985
       0.763636
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       6
       1
    
    
      59355
       79119
       1
       D3
       26
       0.230769
       2
       3
       1
       0.223881
       0.745455
      ...
       0
       1
       0
       0
       0
       0
       0
       0
       6
       1
    
    
      59356
       79120
       1
       D3
       10
       0.076923
       2
       3
       1
       0.522388
       0.600000
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       6
       2
    
    
      59357
       79121
       1
       D3
       26
       1.000000
       2
       1
       3
       0.582090
       0.781818
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       6
       0
    
    
      59358
       79122
       1
       D4
       26
       0.282051
       2
       3
       1
       0.238806
       0.727273
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       6
       0
    
    
      59359
       79123
       1
       D3
       26
       0.230769
       2
       3
       1
       0.447761
       0.781818
      ...
       0
       0
       0
       1
       0
       0
       0
       0
       6
       1
    
    
      59360
       79124
       1
       D4
       26
       1.000000
       2
       3
       1
       0.194030
       0.654545
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       5
       1
    
    
      59361
       79126
       1
       A1
       26
       0.230769
       2
       1
       1
       0.268657
       0.727273
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       2
       0
    
    
      59362
       79127
       1
       D4
       26
       0.230769
       2
       3
       1
       0.253731
       0.781818
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       7
       0
    
    
      59363
       79128
       1
       D2
        4
       0.076923
       2
       3
       1
       0.746269
       0.563636
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       6
       0
    
    
      59364
       79130
       1
       D2
       26
       0.076923
       2
       3
       1
       0.552239
       0.727273
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       1
       1
    
    
      59365
       79131
       1
       D1
       29
       0.076923
       2
       3
       1
       0.641791
       0.709091
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       5
       0
    
    
      59366
       79132
       1
       D1
       26
       0.282051
       2
       3
       1
       0.582090
       0.781818
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       2
    
    
      59367
       79133
       1
       E1
       26
       0.179487
       2
       3
       3
       0.373134
       0.600000
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       6
       1
    
    
      59368
       79134
       1
       D4
       26
       0.230769
       2
       1
       1
       0.417910
       0.727273
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       0
    
    
      59369
       79135
       1
       D1
       26
       0.179487
       2
       3
       1
       0.611940
       0.745455
      ...
       0
       0
       0
       0
       0
       0
       0
       1
       2
       6
    
    
      59370
       79136
       1
       D3
       26
       0.230769
       2
       3
       1
       0.238806
       0.763636
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       4
       0
    
    
      59371
       79137
       1
       D3
       26
       0.487179
       2
       1
       1
       0.537313
       0.709091
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       6
       1
    
    
      59372
       79138
       1
       D3
       26
       0.487179
       2
       3
       1
       0.477612
       0.763636
      ...
       0
       1
       0
       0
       0
       0
       0
       1
       2
       4
    
    
      59373
       79139
       2
       D4
       29
       0.487179
       2
       3
       1
       0.208955
       0.800000
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       0
    
    
      59374
       79140
       1
       D4
       26
       0.307692
       2
       3
       1
       0.164179
       0.690909
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       7
       0
    
    
      59375
       79141
       1
       C1
       26
       0.076923
       2
       3
       1
       0.477612
       0.654545
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       1
    
    
      59376
       79142
       1
       D1
       10
       0.230769
       2
       3
       1
       0.074627
       0.709091
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       4
       0
    
    
      59377
       79143
       1
       D3
       26
       0.230769
       2
       3
       1
       0.432836
       0.800000
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       7
       0
    
    
      59378
       79144
       1
       E1
       26
       0.076923
       2
       3
       1
       0.104478
       0.745455
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       1
    
    
      59379
       79145
       1
       D2
       10
       0.230769
       2
       3
       1
       0.507463
       0.690909
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       8
       2
    
    
      59380
       79146
       1
       A8
       26
       0.076923
       2
       3
       1
       0.447761
       0.781818
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       7
       0
    
  

59381 rows × 129 columns



In [ ]:

Create or import the test data set



In [328]:

    
#Turn split train to test on or off.  

#If on, 10% of the dataset is used for feature training
#If off, training set is loaded from file

splitTrainToTest = 1

if(splitTrainToTest):
    
    d_gb = df.groupby("Response")
    
    df_test = pd.DataFrame()
    
    for name, group in d_gb:
        df_test = pd.concat([df_test, group[:len(group)/10]], axis=0, join='outer')
    print "test data is 10% training data"
    
else:
    d_test = pd.read_csv('prud_files/test.csv')
    df_test = d_test.fillna(-1)
    b = pd.DataFrame(df[varTypes["dummy"]].sum(axis=1), columns=["Medical_Keyword_Sum"])
    df_test= pd.concat([df_test,b], axis=1, join='outer')
    print "test data is prud_files/test.csv"









    



test data is 10% training data

Data transformation and extraction

Data groupings



In [275]:

    
df_cat = df[["Id","Response"]+varTypes["categorical"]]
df_disc = df[["Id","Response"]+varTypes["discrete"]]
df_cont = df[["Id","Response"]+varTypes["continuous"]]
df_dummy = df[["Id","Response"]+varTypes["dummy"]]

df_cat_test = df_test[["Id","Response"]+varTypes["categorical"]]
df_disc_test = df_test[["Id","Response"]+varTypes["discrete"]]
df_cont_test = df_test[["Id","Response"]+varTypes["continuous"]]
df_dummy_test = df_test[["Id","Response"]+varTypes["dummy"]]



In [355]:

    
## Extract categories of each column

df_n = df[["Response", "Medical_Keyword_Sum"]+varTypes["categorical"]+varTypes["discrete"]+varTypes["continuous"]].copy()
df_test_n = df_test[["Response","Medical_Keyword_Sum"]+varTypes["categorical"]+varTypes["discrete"]+varTypes["continuous"]].copy()



In [356]:

    
#Get all the Product Info 2 categories
a = pd.get_dummies(df["Product_Info_2"]).columns.tolist()

norm_PI2_dict = dict()

#Create an enumerated dictionary of Product Info 2 categories
i=1
for c in a:
    norm_PI2_dict.update({c:i})
    i+=1 

print norm_PI2_dict

df_n = df_n.replace(to_replace={'Product_Info_2':norm_PI2_dict})
df_test_n = df_test_n.replace(to_replace={'Product_Info_2':norm_PI2_dict})

df_n









    



{'B2': 10, 'D1': 15, 'E1': 19, 'D3': 17, 'A1': 1, 'D4': 18, 'A3': 3, 'A2': 2, 'A5': 5, 'A4': 4, 'A7': 7, 'A6': 6, 'C3': 13, 'A8': 8, 'C1': 11, 'C4': 14, 'D2': 16, 'C2': 12, 'B1': 9}






    Out[356]:






  
    
      
      Response
      Medical_Keyword_Sum
      Product_Info_1
      Product_Info_2
      Product_Info_3
      Product_Info_5
      Product_Info_6
      Product_Info_7
      Employment_Info_2
      Employment_Info_3
      ...
      Wt
      BMI
      Employment_Info_1
      Employment_Info_4
      Employment_Info_6
      Insurance_History_5
      Family_Hist_2
      Family_Hist_3
      Family_Hist_4
      Family_Hist_5
    
  
  
    
      0    
       8
       0
       1
       17
       10
       2
       1
       1
       12
       1
      ...
       0.148536
       0.323008
       0.0280
       0.00000
      -1.0000
       0.000667
      -1.000000
       0.598039
      -1.000000
       0.526786
    
    
      1    
       4
       0
       1
        1
       26
       2
       3
       1
        1
       3
      ...
       0.131799
       0.272288
       0.0000
       0.00000
       0.0018
       0.000133
       0.188406
      -1.000000
       0.084507
      -1.000000
    
    
      2    
       8
       0
       1
       19
       26
       2
       3
       1
        9
       1
      ...
       0.288703
       0.428780
       0.0300
       0.00000
       0.0300
      -1.000000
       0.304348
      -1.000000
       0.225352
      -1.000000
    
    
      3    
       8
       1
       1
       18
       10
       2
       3
       1
        9
       1
      ...
       0.205021
       0.352438
       0.0420
       0.00000
       0.2000
      -1.000000
       0.420290
      -1.000000
       0.352113
      -1.000000
    
    
      4    
       8
       0
       1
       16
       26
       2
       3
       1
        9
       1
      ...
       0.234310
       0.424046
       0.0270
       0.00000
       0.0500
      -1.000000
       0.463768
      -1.000000
       0.408451
      -1.000000
    
    
      5    
       8
       2
       1
       16
       26
       3
       1
       1
       15
       1
      ...
       0.299163
       0.364887
       0.3250
       0.00000
       1.0000
       0.005000
      -1.000000
       0.294118
       0.507042
      -1.000000
    
    
      6    
       8
       0
       1
        8
       10
       2
       3
       1
        1
       3
      ...
       0.173640
       0.376587
       0.1100
      -1.00000
       0.8000
       0.001667
       0.594203
      -1.000000
       0.549296
      -1.000000
    
    
      7    
       1
       0
       1
       16
       26
       2
       3
       1
       12
       1
      ...
       0.403766
       0.571612
       0.1200
       0.00000
       1.0000
       0.000667
      -1.000000
       0.490196
      -1.000000
       0.633929
    
    
      8    
       8
       1
       1
       17
       26
       2
       3
       1
        9
       1
      ...
       0.184100
       0.362643
       0.1650
       0.00000
       1.0000
       0.007613
      -1.000000
       0.529412
       0.676056
      -1.000000
    
    
      9    
       1
       2
       1
       19
       21
       2
       3
       1
        1
       3
      ...
       0.284519
       0.587796
       0.0250
       0.00000
       0.0500
       0.000667
       0.797101
      -1.000000
      -1.000000
       0.553571
    
    
      10   
       6
       4
       1
       17
       26
       2
       3
       1
        9
       1
      ...
       0.309623
       0.521668
       0.0500
      -1.00000
       0.1500
       0.000587
      -1.000000
       0.470588
       0.647887
      -1.000000
    
    
      11   
       2
       1
       1
       18
       26
       2
       3
       1
        3
       1
      ...
       0.271967
       0.455050
       0.0900
      -1.00000
       1.0000
      -1.000000
       0.405797
      -1.000000
       0.352113
      -1.000000
    
    
      12   
       7
       1
       1
        2
       26
       2
       3
       1
        9
       1
      ...
       0.163180
       0.320784
       0.0750
       0.00000
      -1.0000
       0.000667
      -1.000000
       0.549020
      -1.000000
       0.482143
    
    
      13   
       3
       1
       2
       15
       26
       2
       3
       1
        9
       1
      ...
       0.361925
       0.507515
       0.1000
      -1.00000
       0.0750
      -1.000000
       0.420290
      -1.000000
       0.338028
      -1.000000
    
    
      14   
       8
       2
       1
       18
       26
       2
       3
       1
        3
       1
      ...
       0.142259
       0.264648
       0.1600
       0.00000
       0.6000
       0.004000
      -1.000000
       0.578431
       0.535211
      -1.000000
    
    
      15   
       5
       3
       1
        7
       26
       2
       3
       1
        9
       1
      ...
       0.330544
       0.581279
       0.0750
       0.00000
      -1.0000
       0.000480
      -1.000000
       0.549020
      -1.000000
       0.535714
    
    
      16   
       8
       1
       2
       18
       26
       2
       3
       1
       14
       1
      ...
       0.246862
       0.360969
       0.1000
       0.00000
       0.2500
      -1.000000
       0.275362
      -1.000000
       0.253521
      -1.000000
    
    
      17   
       7
       0
       1
       17
       26
       2
       3
       1
        9
       1
      ...
       0.228033
       0.430949
       0.0378
       0.00000
       0.0360
      -1.000000
      -1.000000
       0.343137
       0.436620
      -1.000000
    
    
      18   
       2
       1
       1
       17
       26
       2
       3
       1
        9
       1
      ...
       0.309623
       0.427394
       0.0800
       0.00000
      -1.0000
       0.000400
      -1.000000
       0.509804
       0.507042
      -1.000000
    
    
      19   
       8
       0
       1
       18
       26
       2
       3
       1
        9
       1
      ...
       0.138075
       0.285254
       0.0550
       0.00000
       0.0000
      -1.000000
       0.289855
      -1.000000
       0.281690
      -1.000000
    
    
      20   
       8
       0
       1
       16
       26
       2
       3
       1
        9
       1
      ...
       0.246862
       0.360969
       0.0830
       0.00000
       0.5000
       0.001107
      -1.000000
       0.509804
       0.478873
      -1.000000
    
    
      21   
       5
       0
       1
        1
       26
       2
       1
       1
        3
       1
      ...
       0.370293
       0.605334
       0.2100
       0.00000
       1.0000
      -1.000000
      -1.000000
       0.421569
      -1.000000
       0.544643
    
    
      22   
       5
       0
       1
       18
       26
       2
       3
       1
        9
       1
      ...
       0.539749
       0.753765
       0.0310
       0.00000
       0.0000
      -1.000000
       0.434783
      -1.000000
       0.394366
      -1.000000
    
    
      23   
       8
       0
       1
        2
       26
       2
       3
       1
        9
       1
      ...
       0.288703
       0.428780
       0.0650
       0.00000
       0.3500
       0.003333
      -1.000000
       0.313725
       0.281690
      -1.000000
    
    
      24   
       6
       0
       1
       15
       26
       2
       1
       1
        9
       1
      ...
       0.435146
       0.576961
       0.0270
       0.00000
       0.1500
       0.000133
       0.376812
      -1.000000
       0.253521
      -1.000000
    
    
      25   
       8
       0
       1
        6
       26
       2
       3
       1
       12
       1
      ...
       0.368201
       0.517129
       0.1000
       0.00000
       0.3500
      -1.000000
      -1.000000
       0.441176
       0.394366
      -1.000000
    
    
      26   
       6
       1
       1
        1
       26
       2
       3
       1
       12
       1
      ...
       0.299163
       0.545946
       0.1500
       0.00000
       1.0000
       0.002333
      -1.000000
      -1.000000
      -1.000000
      -1.000000
    
    
      27   
       8
       0
       1
       17
       26
       2
       3
       1
        9
       1
      ...
       0.215481
       0.296359
       0.0420
       0.00000
      -1.0000
       0.000667
      -1.000000
      -1.000000
      -1.000000
      -1.000000
    
    
      28   
       7
       2
       1
       18
       26
       2
       3
       1
        9
       1
      ...
       0.276151
       0.546823
       0.1200
       0.00000
       0.1200
       0.001000
      -1.000000
      -1.000000
       0.450704
      -1.000000
    
    
      29   
       6
       2
       1
       17
       26
       2
       3
       1
        9
       1
      ...
       0.278243
       0.506623
       0.1150
       0.00000
       1.0000
       0.004720
      -1.000000
       0.529412
       0.661972
      -1.000000
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      59351
       5
       0
       1
        7
       26
       2
       3
       1
       12
       1
      ...
       0.351464
       0.491491
       0.0700
       0.00000
       0.0700
      -1.000000
      -1.000000
       0.254902
      -1.000000
       0.339286
    
    
      59352
       6
       1
       1
       16
       10
       2
       3
       1
       12
       1
      ...
       0.246862
       0.488220
       0.0200
       0.00000
       0.1000
      -1.000000
      -1.000000
      -1.000000
       0.408451
      -1.000000
    
    
      59353
       6
       3
       1
       18
       26
       2
       1
       1
        9
       1
      ...
       0.382845
       0.539563
       0.0800
       0.00000
       0.0000
      -1.000000
       0.275362
      -1.000000
       0.239437
      -1.000000
    
    
      59354
       6
       1
       1
       16
       26
       2
       1
       1
       12
       1
      ...
       0.341004
       0.494104
       0.0500
       0.02000
       0.2500
       0.006667
       0.608696
      -1.000000
       0.507042
      -1.000000
    
    
      59355
       6
       1
       1
       17
       26
       2
       3
       1
        9
       1
      ...
       0.361925
       0.547451
       0.0580
       0.00000
       0.2250
       0.000333
       0.405797
      -1.000000
       0.323944
      -1.000000
    
    
      59356
       6
       2
       1
       17
       10
       2
       3
       1
        9
       1
      ...
       0.299163
       0.618050
       0.0400
       0.00000
      -1.0000
       0.000067
      -1.000000
       0.598039
       0.591549
      -1.000000
    
    
      59357
       6
       0
       1
       17
       26
       2
       1
       3
        9
       1
      ...
       0.351464
       0.491491
       0.1250
       0.02500
       1.0000
      -1.000000
      -1.000000
       0.617647
      -1.000000
       0.625000
    
    
      59358
       6
       0
       1
       18
       26
       2
       3
       1
        9
       1
      ...
       0.372385
       0.586182
       0.0900
      -1.00000
       0.0050
      -1.000000
       0.275362
      -1.000000
       0.281690
      -1.000000
    
    
      59359
       6
       1
       1
       17
       26
       2
       3
       1
       14
       1
      ...
       0.424686
       0.603660
       0.0600
       0.00750
       0.2250
       0.002067
      -1.000000
       0.568627
       0.619718
      -1.000000
    
    
      59360
       5
       1
       1
       18
       26
       2
       3
       1
        9
       1
      ...
       0.146444
       0.258890
       0.0800
       0.00000
       0.1500
      -1.000000
       0.304348
      -1.000000
       0.267606
      -1.000000
    
    
      59361
       2
       0
       1
        1
       26
       2
       1
       1
        3
       1
      ...
       0.267782
       0.411703
       0.2500
       0.00000
       0.9000
      -1.000000
       0.478261
      -1.000000
       0.267606
      -1.000000
    
    
      59362
       7
       0
       1
       18
       26
       2
       3
       1
        9
       1
      ...
       0.351464
       0.491491
       0.0540
       0.00000
       0.0250
       0.000667
       0.449275
      -1.000000
       0.408451
      -1.000000
    
    
      59363
       6
       0
       1
       16
        4
       2
       3
       1
        1
       3
      ...
       0.205021
       0.464570
       0.0000
       0.00000
       0.0000
      -1.000000
      -1.000000
       0.519608
      -1.000000
      -1.000000
    
    
      59364
       1
       1
       1
       16
       26
       2
       3
       1
       12
       1
      ...
       0.177824
       0.261651
       0.0500
       0.00000
       0.0000
      -1.000000
      -1.000000
       0.598039
      -1.000000
       0.517857
    
    
      59365
       5
       0
       1
       15
       29
       2
       3
       1
        9
       1
      ...
       0.284519
       0.458023
       0.0450
       0.00000
       0.2000
      -1.000000
      -1.000000
       0.578431
      -1.000000
       0.535714
    
    
      59366
       8
       2
       1
       15
       26
       2
       3
       1
       14
       1
      ...
       0.320084
       0.443418
       0.0580
       0.00000
       0.3000
       0.002000
      -1.000000
       0.480392
       0.704225
      -1.000000
    
    
      59367
       6
       1
       1
       19
       26
       2
       3
       3
        9
       1
      ...
       0.320084
       0.661270
       0.1600
       0.00000
       0.0000
       0.005667
      -1.000000
       0.568627
      -1.000000
       0.491071
    
    
      59368
       8
       0
       1
       18
       26
       2
       1
       1
        9
       1
      ...
       0.299163
       0.464047
       0.0900
       0.00000
       0.2000
       0.000733
      -1.000000
       0.568627
       0.436620
      -1.000000
    
    
      59369
       2
       6
       1
       15
       26
       2
       3
       1
        1
       3
      ...
       0.451883
       0.693246
       0.0920
      -1.00000
       0.1600
      -1.000000
      -1.000000
       0.627451
      -1.000000
       0.348214
    
    
      59370
       4
       0
       1
       17
       26
       2
       3
       1
        9
       1
      ...
       0.330544
       0.477625
       0.0650
       0.00000
       0.0500
       0.001333
      -1.000000
       0.078431
      -1.000000
       0.348214
    
    
      59371
       6
       1
       1
       17
       26
       2
       1
       1
       12
       1
      ...
       0.343096
       0.558626
       0.0650
       0.00000
      -1.0000
      -1.000000
      -1.000000
       0.519608
       0.661972
      -1.000000
    
    
      59372
       2
       4
       1
       17
       26
       2
       3
       1
       14
       1
      ...
       0.305439
       0.438076
       0.2000
       0.00000
       0.3000
      -1.000000
       0.681159
      -1.000000
       0.605634
      -1.000000
    
    
      59373
       8
       0
       2
       18
       29
       2
       3
       1
        9
       1
      ...
       0.257322
       0.332885
       0.0320
       0.00000
      -1.0000
      -1.000000
       0.275362
      -1.000000
       0.295775
      -1.000000
    
    
      59374
       7
       0
       1
       18
       26
       2
       3
       1
        9
       1
      ...
       0.288703
       0.484658
       0.0590
       0.00000
       0.0200
      -1.000000
       0.405797
      -1.000000
       0.295775
      -1.000000
    
    
      59375
       8
       1
       1
       11
       26
       2
       3
       1
        9
       1
      ...
       0.271967
       0.494827
       0.0450
       0.00000
       0.0450
      -1.000000
      -1.000000
      -1.000000
      -1.000000
      -1.000000
    
    
      59376
       4
       0
       1
       15
       10
       2
       3
       1
        1
       3
      ...
       0.320084
       0.519103
       0.0200
       0.00000
       0.0250
      -1.000000
       0.217391
      -1.000000
       0.197183
      -1.000000
    
    
      59377
       7
       0
       1
       17
       26
       2
       3
       1
        9
       1
      ...
       0.403766
       0.551119
       0.1000
       0.00001
       0.3500
       0.000267
       0.565217
      -1.000000
       0.478873
      -1.000000
    
    
      59378
       8
       1
       1
       19
       26
       2
       3
       1
        9
       1
      ...
       0.246862
       0.360969
       0.0350
       0.00000
      -1.0000
      -1.000000
       0.173913
      -1.000000
       0.126761
      -1.000000
    
    
      59379
       8
       2
       1
       16
       10
       2
       3
       1
        9
       1
      ...
       0.276151
       0.462452
       0.0380
      -1.00000
      -1.0000
      -1.000000
      -1.000000
       0.372549
       0.704225
      -1.000000
    
    
      59380
       7
       0
       1
        8
       26
       2
       3
       1
        9
       1
      ...
       0.382845
       0.539563
       0.1230
      -1.00000
       0.3000
      -1.000000
      -1.000000
       0.401961
      -1.000000
       0.589286
    
  

59381 rows × 80 columns

Categorical normalization



In [359]:

    
# normalizes a single dataframe column and returns the result
def normalize_df(d):
    min_max_scaler = preprocessing.MinMaxScaler()
    x = d.values.astype(np.float)
    #return pd.DataFrame(min_max_scaler.fit_transform(x))
    return pd.DataFrame(min_max_scaler.fit_transform(x))


def normalize_cat(d):
    
    for x in varTypes["discrete"]:
            try:
                a = pd.DataFrame(normalize_df(d_disc[x]))
                a.columns=[str("n"+x)]
                d_disc = pd.concat([d_disc, a], axis=1, join='outer')
            except Exception as e:
                print e.args
                print "Error on "+str(x)+" w error: "+str(e)

    return d_disc



def normalize_disc(d_disc):
    
    for x in varTypes["discrete"]:
            try:
                a = pd.DataFrame(normalize_df(d_disc[x]))
                a.columns=[str("n"+x)]
                d_disc = pd.concat([d_disc, a], axis=1, join='outer')
            except Exception as e:
                print e.args
                print "Error on "+str(x)+" w error: "+str(e)

    return d_disc


# t= categorical, discrete, continuous

def normalize_cols(d, t = "categorical"):
    
    for x in varTypes[t]:
            try:
                a = pd.DataFrame(normalize_df(d[x]))
                a.columns=[str("n"+x)]
                a = pd.concat(a, axis=1, join='outer')
            except Exception as e:
                print e.args
                print "Error on "+str(x)+" w error: "+str(e)

    return a

def normalize_response(d):
    
    a = pd.DataFrame(normalize_df(d["Response"]))
    a.columns=["nResponse"]
    #d_cat = pd.concat([d_cat, a], axis=1, join='outer')
    
    return a



In [15]:

    
df_n_2 = df_n.copy()
df_n_test_2 = df_test_n.copy()

df_n_2 = df_n_2[["Response"]+varTypes["categorical"]+varTypes["discrete"]]


df_n_test_2 = df_n_test_2[["Response"]+varTypes["categorical"]+varTypes["discrete"]]

df_n_2 = df_n_2.apply(normalize_df, axis=1)
df_n_test_2 = df_n_test_2.apply(normalize_df, axis=1)









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-88ac19475656> in <module>()
      7 df_n_test_2 = df_n_test_2[["Response"]+varTypes["categorical"]+varTypes["discrete"]]
      8 
----> 9 df_n_2 = df_n_2.apply(normalize_df, axis=1)
     10 df_n_test_2 = df_n_test_2.apply(normalize_df, axis=1)

C:\Users\Robbie\Anaconda\lib\site-packages\pandas\core\frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   3594                     if reduce is None:
   3595                         reduce = True
-> 3596                     return self._apply_standard(f, axis, reduce=reduce)
   3597             else:
   3598                 return self._apply_broadcast(f, axis)

C:\Users\Robbie\Anaconda\lib\site-packages\pandas\core\frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
   3701                 index = None
   3702 
-> 3703             result = self._constructor(data=results, index=index)
   3704             result.columns = res_index
   3705 

C:\Users\Robbie\Anaconda\lib\site-packages\pandas\core\frame.pyc in __init__(self, data, index, columns, dtype, copy)
    206                                  dtype=dtype, copy=copy)
    207         elif isinstance(data, dict):
--> 208             mgr = self._init_dict(data, index, columns, dtype=dtype)
    209         elif isinstance(data, ma.MaskedArray):
    210             import numpy.ma.mrecords as mrecords

C:\Users\Robbie\Anaconda\lib\site-packages\pandas\core\frame.pyc in _init_dict(self, data, index, columns, dtype)
    334 
    335         return _arrays_to_mgr(arrays, data_names, index, columns,
--> 336                               dtype=dtype)
    337 
    338     def _init_ndarray(self, values, index, columns, dtype=None,

C:\Users\Robbie\Anaconda\lib\site-packages\pandas\core\frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   4620 
   4621     # don't force copy because getting jammed in an ndarray anyway
-> 4622     arrays = _homogenize(arrays, index, dtype)
   4623 
   4624     # from BlockManager perspective

C:\Users\Robbie\Anaconda\lib\site-packages\pandas\core\frame.pyc in _homogenize(data, index, dtype)
   4934 
   4935             v = _sanitize_array(v, index, dtype=dtype, copy=False,
-> 4936                                 raise_cast_failure=False)
   4937 
   4938         homogenized.append(v)

C:\Users\Robbie\Anaconda\lib\site-packages\pandas\core\series.pyc in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
   2685             raise Exception('Data must be 1-dimensional')
   2686         else:
-> 2687             subarr = _asarray_tuplesafe(data, dtype=dtype)
   2688 
   2689     # This is to prevent mixed-type Series getting all casted to

C:\Users\Robbie\Anaconda\lib\site-packages\pandas\core\common.pyc in _asarray_tuplesafe(values, dtype)
   2305             except ValueError:
   2306                 # we have a list-of-list
-> 2307                 result[:] = [tuple(x) for x in values]
   2308 
   2309     return result

TypeError: 'numpy.int64' object is not iterable



In [382]:

    
df_n_3 = pd.concat([df["Id"],df_n["Medical_Keyword_Sum"],df_n_2, df_n[varTypes["continuous"]]],axis=1,join='outer')
df_n_test_3 = pd.concat([df_test["Id"],df_test_n["Medical_Keyword_Sum"],df_n_test_2, df_test_n[varTypes["continuous"]]],axis=1,join='outer')



In [ ]:

    
train_data = df_n_3.values
test_data = df_n_test_3.values



In [ ]:

    
from sklearn import linear_model

clf = linear_model.Lasso(alpha = 0.1)
clf.fit(X_train, Y_train)
pred = clf.predict(X_test)

print accuracy_score(pred, Y_test)



In [ ]:

    
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators = 1)
#model = model.fit(train_data[0:,2:],train_data[0:,0])



In [409]:

    
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score


clf = GaussianNB()

clf.fit(train_data[0:,2:],train_data[0:,0])

pred = clf.predict(X_test)

print accuracy_score(pred, Y_test)









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-409-2c278d58102b> in <module>()
      5 clf = GaussianNB()
      6 
----> 7 clf.fit(X_train, Y_train)
      8 
      9 pred = clf.predict(X_test)

C:\Users\Robbie\Anaconda\lib\site-packages\sklearn\naive_bayes.pyc in fit(self, X, y)
    150         y = column_or_1d(y, warn=True)
    151 
--> 152         n_samples, n_features = X.shape
    153 
    154         self.classes_ = unique_y = np.unique(y)

ValueError: need more than 1 value to unpack



In [410]:

    
from sklearn.metrics import accuracy_score









    



  File "<ipython-input-410-b1fabbd013ba>", line 1
    from sklearn import
                        ^
SyntaxError: invalid syntax



In [ ]:



In [407]:









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-407-83dc42ad503c> in <module>()
      4 clf = linear_model.Lasso(alpha = 0.1)
      5 clf.fit(X_train, Y_train)
----> 6 pred = clf.predict(X_test)
      7 
      8 print accuracy_score(pred, Y_test)

C:\Users\Robbie\Anaconda\lib\site-packages\sklearn\linear_model\base.pyc in predict(self, X)
    149             Returns predicted values.
    150         """
--> 151         return self.decision_function(X)
    152 
    153     _center_data = staticmethod(center_data)

C:\Users\Robbie\Anaconda\lib\site-packages\sklearn\linear_model\coordinate_descent.pyc in decision_function(self, X)
    734                             + self.intercept_)
    735         else:
--> 736             return super(ElasticNet, self).decision_function(X)
    737 
    738 

C:\Users\Robbie\Anaconda\lib\site-packages\sklearn\linear_model\base.pyc in decision_function(self, X)
    134         X = safe_asarray(X)
    135         return safe_sparse_dot(X, self.coef_.T,
--> 136                                dense_output=True) + self.intercept_
    137 
    138     def predict(self, X):

C:\Users\Robbie\Anaconda\lib\site-packages\sklearn\utils\extmath.pyc in safe_sparse_dot(a, b, dense_output)
    178         return ret
    179     else:
--> 180         return fast_dot(a, b)
    181 
    182 

ValueError: shapes (5934,) and (59381,) not aligned: 5934 (dim 0) != 59381 (dim 0)



In [340]:

    
df_n.columns.tolist()









    Out[340]:





['nMedical_History_41']



In [32]:

    
d_cat = df_cat.copy()
d_cat_test = df_cat_test.copy()

d_cont = df_cont.copy()
d_cont_test = df_cont_test.copy()

d_disc = df_disc.copy()
d_disc_test = df_disc_test.copy()



In [ ]:

    
#df_cont_n = normalize_cols(d_cont, "continuous")
#df_cont_test_n = normalize_cols(d_cont_test, "continuous")



In [31]:

    
df_cat_n = normalize_cols(d_cat, "categorical")
df_cat_test_n = normalize_cols(d_cat_test, "categorical")









    



('could not convert string to float: A8',)
Error on Product_Info_2 w error: could not convert string to float: A8
('could not convert string to float: A8',)
Error on Product_Info_2 w error: could not convert string to float: A8



In [33]:

    
df_disc_n = normalize_cols(d_disc, "discrete")
df_disc_test_n = normalize_cols(d_disc, "discrete")









    



("Input contains NaN, infinity or a value too large for dtype('float64').",)
Error on Medical_History_1 w error: Input contains NaN, infinity or a value too large for dtype('float64').
("Input contains NaN, infinity or a value too large for dtype('float64').",)
Error on Medical_History_10 w error: Input contains NaN, infinity or a value too large for dtype('float64').
("Input contains NaN, infinity or a value too large for dtype('float64').",)
Error on Medical_History_15 w error: Input contains NaN, infinity or a value too large for dtype('float64').
("Input contains NaN, infinity or a value too large for dtype('float64').",)
Error on Medical_History_24 w error: Input contains NaN, infinity or a value too large for dtype('float64').
("Input contains NaN, infinity or a value too large for dtype('float64').",)
Error on Medical_History_32 w error: Input contains NaN, infinity or a value too large for dtype('float64').
("Input contains NaN, infinity or a value too large for dtype('float64').",)
Error on Medical_History_1 w error: Input contains NaN, infinity or a value too large for dtype('float64').
("Input contains NaN, infinity or a value too large for dtype('float64').",)
Error on Medical_History_10 w error: Input contains NaN, infinity or a value too large for dtype('float64').
("Input contains NaN, infinity or a value too large for dtype('float64').",)
Error on Medical_History_15 w error: Input contains NaN, infinity or a value too large for dtype('float64').
("Input contains NaN, infinity or a value too large for dtype('float64').",)
Error on Medical_History_24 w error: Input contains NaN, infinity or a value too large for dtype('float64').
("Input contains NaN, infinity or a value too large for dtype('float64').",)
Error on Medical_History_32 w error: Input contains NaN, infinity or a value too large for dtype('float64').



In [21]:

    
a = df_cat_n.iloc[:,62:]

# TODO: Clump into function
#rows are normalized into binary columns of groupings









    Out[21]:





<bound method Index.tolist of Index([u'nResponse', u'nProduct_Info_1', u'nProduct_Info_3', u'nProduct_Info_5', u'nProduct_Info_6', u'nProduct_Info_7', u'nEmployment_Info_2', u'nEmployment_Info_3', u'nEmployment_Info_5', u'nInsuredInfo_1', u'nInsuredInfo_2', u'nInsuredInfo_3', u'nInsuredInfo_4', u'nInsuredInfo_5', u'nInsuredInfo_6', u'nInsuredInfo_7', u'nInsurance_History_1', u'nInsurance_History_2', u'nInsurance_History_3', u'nInsurance_History_4', u'nInsurance_History_7', u'nInsurance_History_8', u'nInsurance_History_9', u'nFamily_Hist_1', u'nMedical_History_2', u'nMedical_History_3', u'nMedical_History_4', u'nMedical_History_5', u'nMedical_History_6', u'nMedical_History_7', u'nMedical_History_8', u'nMedical_History_9', u'nMedical_History_11', u'nMedical_History_12', u'nMedical_History_13', u'nMedical_History_14', u'nMedical_History_16', u'nMedical_History_17', u'nMedical_History_18', u'nMedical_History_19', u'nMedical_History_20', u'nMedical_History_21', u'nMedical_History_22', u'nMedical_History_23', u'nMedical_History_25', u'nMedical_History_26', u'nMedical_History_27', u'nMedical_History_28', u'nMedical_History_29', u'nMedical_History_30', u'nMedical_History_31', u'nMedical_History_33', u'nMedical_History_34', u'nMedical_History_35', u'nMedical_History_36', u'nMedical_History_37', u'nMedical_History_38', u'nMedical_History_39', u'nMedical_History_40', u'nMedical_History_41'], dtype='object')>



In [14]:

    
# Define various group by data streams

df = d
    
gb_PI2 = df.groupby('Product_Info_1')
gb_PI2 = df.groupby('Product_Info_2')

gb_Ins_Age = df.groupby('Ins_Age')
gb_Ht = df.groupby('Ht')
gb_Wt = df.groupby('Wt')

gb_response = df.groupby('Response')



In [ ]:

    
#Outputs rows the differnet categorical groups

for c in df.columns:
    if (c in varTypes['categorical']):
        if(c != 'Id'):
            a = [ str(x)+", " for x in df.groupby(c).groups ]
            print c + " : " + str(a)



In [ ]:

    
df_prod_info = pd.DataFrame(d, columns=["Response"]+ [ "Product_Info_"+str(x) for x in range(1,8)]) 

df_emp_info = pd.DataFrame(d, columns=["Response"]+ [ "Employment_Info_"+str(x) for x in range(1,6)]) 

# continous
df_bio = pd.DataFrame(d, columns=["Response", "Ins_Age", "Ht", "Wt","BMI"])

# all the values are discrete (0 or 1)
df_med_kw = pd.DataFrame(d, columns=["Response"]+ [ "Medical_Keyword_"+str(x) for x in range(1,48)])



In [ ]:

Grouping of various categorical data sets

Histograms and descriptive statistics for Risk Response, Ins_Age, BMI, Wt



In [ ]:

    
plt.figure(0)
plt.subplot(121)
plt.title("Categorical - Histogram for Risk Response")
plt.xlabel("Risk Response (1-7)")
plt.ylabel("Frequency")
plt.hist(df.Response)
plt.savefig('images/hist_Response.png')
print df.Response.describe()
print ""

plt.subplot(122)
plt.title("Normalized - Histogram for Risk Response")
plt.xlabel("Normalized Risk Response (1-7)")
plt.ylabel("Frequency")
plt.hist(df_cat_n.nResponse)
plt.savefig('images/hist_norm_Response.png')
print df_cat_n.nResponse.describe()
print ""



In [ ]:

    
def plotContinuous(d, t):
    plt.title("Continuous - Histogram for "+ str(t))
    plt.xlabel("Normalized "+str(t)+"[0,1]")
    plt.ylabel("Frequency")
    plt.hist(d)
    plt.savefig("images/hist_"+str(t)+".png")
    #print df.iloc[:,:1].describe()
    print ""
    

for i in range(i,len(df_cat.columns:
    
    plt.figure(1)
    plotContinuous(df.Ins_Age, "Ins_Age")
plt.show()



In [26]:

    
df_disc.describe()[7:8]









    Out[26]:






  
    
      
      Id
      Response
      Medical_History_1
      Medical_History_10
      Medical_History_15
      Medical_History_24
      Medical_History_32
    
  
  
    
      max
       79146
       8
       240
       240
       240
       240
       240



In [ ]:

    
plt.figure(1)
plt.title("Continuous - Histogram for Ins_Age")
plt.xlabel("Normalized Ins_Age [0,1]")
plt.ylabel("Frequency")
plt.hist(df.Ins_Age)
plt.savefig('images/hist_Ins_Age.png')
print df.Ins_Age.describe()
print ""

plt.figure(2)
plt.title("Continuous - Histogram for BMI")
plt.xlabel("Normalized BMI [0,1]")
plt.ylabel("Frequency")
plt.hist(df.BMI)
plt.savefig('images/hist_BMI.png')
print df.BMI.describe()
print ""

plt.figure(3)
plt.title("Continuous - Histogram for Wt")
plt.xlabel("Normalized Wt [0,1]")
plt.ylabel("Frequency")
plt.hist(df.Wt)
plt.savefig('images/hist_Wt.png')
print df.Wt.describe()
print ""

plt.show()



In [ ]:



In [ ]:

    
plt.show()

Histograms and descriptive statistics for Product_Info_1-7



In [ ]:

    
k=1
for i in range(1,8):
    '''
    print "The iteration is: "+str(i)
    print df['Product_Info_'+str(i)].describe()
    print ""
    '''
    
    plt.figure(i)
        
    if(i == 4):
       
        plt.title("Continuous - Histogram for Product_Info_"+str(i))
        plt.xlabel("Normalized value: [0,1]")
        plt.ylabel("Frequency")
        plt.hist(df['Product_Info_'+str(i)])
        plt.savefig('images/hist_Product_Info_'+str(i)+'.png')
            
        
    else:
        
        if(i != 2):
            
            plt.subplot(1,2,1)
            plt.title("Cat-Hist- Product_Info_"+str(i))
            plt.xlabel("Categories")
            plt.ylabel("Frequency")
            plt.hist(df['Product_Info_'+str(i)])
            plt.savefig('images/hist_Product_Info_'+str(i)+'.png')
            
            
            plt.subplot(1,2,2)
            plt.title("Normalized - Histogram of Product_Info_"+str(i))
            plt.xlabel("Categories")
            plt.ylabel("Frequency")
            plt.hist(df_cat_n['nProduct_Info_'+str(i)])
            plt.savefig('images/hist_norm_Product_Info_'+str(i)+'.png')
            
        elif(i == 2):
            plt.title("Cat-Hist Product_Info_"+str(i))
            plt.xlabel("Categories")
            plt.ylabel("Frequency")
            df.Product_Info_2.value_counts().plot(kind='bar') 
            plt.savefig('images/hist_Product_Info_'+str(i)+'.png')
         
plt.show()

Split dataframes into categorical, continuous, discrete, dummy, and response



In [ ]:

    
catD = df.loc[:,varTypes['categorical']]
contD = df.loc[:,varTypes['continuous']]
disD = df.loc[:,varTypes['discrete']]
dummyD = df.loc[:,varTypes['dummy']]
respD = df.loc[:,['id','Response']]

Descriptive statistics and scatter plot relating Product_Info_2 and Response



In [ ]:

    
prod_info = [ "Product_Info_"+str(i) for i in range(1,8)]

a = catD.loc[:, prod_info[1]]

stats = catD.groupby(prod_info[1]).describe()



In [ ]:

    
c = gb_PI2.Response.count()
plt.figure(0)

plt.scatter(c[0],c[1])



In [ ]:

    
plt.figure(0)
plt.title("Histogram of "+"Product_Info_"+str(i))
plt.xlabel("Categories " + str((a.describe())['count']))
plt.ylabel("Frequency")



In [ ]:

    
for i in range(1,8):
    a = catD.loc[:, "Product_Info_"+str(i)]
    if(i is not 4):
        print a.describe()
    print ""
    
    plt.figure(i)
    plt.title("Histogram of "+"Product_Info_"+str(i))
    plt.xlabel("Categories " + str((catD.groupby(key).describe())['count']))
    plt.ylabel("Frequency")
    
    #fig, axes = plt.subplots(nrows = 1, ncols = 2)
    #catD[key].value_counts(normalize=True).hist(ax=axes[0]); axes[0].set_title("Histogram: "+str(key))
    #catD[key].value_counts(normalize=True).hist(cumulative=True,ax=axes[1]); axes[1].set_title("Cumulative HG: "+str(key))
    
    if a.dtype in (np.int64, np.float, float, int):
        a.hist()
        
# Random functions
#catD.Product_Info_1.describe()
#catD.loc[:, prod_info].groupby('Product_Info_2').describe()
#df[varTypes['categorical']].hist()



In [ ]:

    
catD.head(5)



In [ ]:

    
#Exploration of the discrete data
disD.describe()



In [ ]:

    
disD.head(5)



In [ ]:

    
#Iterate through each categorical column of data
#Perform a 2D histogram later

i=0    
for key in varTypes['categorical']:
    
    #print "The category is: {0} with value_counts: {1} and detailed tuple: {2} ".format(key, l.count(), l)
    plt.figure(i)
    plt.title("Histogram of "+str(key))
    plt.xlabel("Categories " + str((df.groupby(key).describe())['count']))
    #fig, axes = plt.subplots(nrows = 1, ncols = 2)
    #catD[key].value_counts(normalize=True).hist(ax=axes[0]); axes[0].set_title("Histogram: "+str(key))
    #catD[key].value_counts(normalize=True).hist(cumulative=True,ax=axes[1]); axes[1].set_title("Cumulative HG: "+str(key))
    if df[key].dtype in (np.int64, np.float, float, int):
        df[key].hist()
    
    i+=1



In [ ]:

    
#Iterate through each 'discrete' column of data
#Perform a 2D histogram later

i=0    
for key in varTypes['discrete']:
    
    #print "The category is: {0} with value_counts: {1} and detailed tuple: {2} ".format(key, l.count(), l)
    plt.figure(i)
    fig, axes = plt.subplots(nrows = 1, ncols = 2)
    
    #Histogram based on normalized value counts of the data set
    disD[key].value_counts().hist(ax=axes[0]); axes[0].set_title("Histogram: "+str(key))
    
    #Cumulative histogram based on normalized value counts of the data set
    disD[key].value_counts().hist(cumulative=True,ax=axes[1]); axes[1].set_title("Cumulative HG: "+str(key))
    i+=1



In [ ]:

    
#2D Histogram

i=0    
for key in varTypes['categorical']:
    
    #print "The category is: {0} with value_counts: {1} and detailed tuple: {2} ".format(key, l.count(), l)
    plt.figure(i)
    #fig, axes = plt.subplots(nrows = 1, ncols = 2)
    
    x = catD[key].value_counts(normalize=True)
    y = df['Response']
    
    plt.hist2d(x[1], y, bins=40, norm=LogNorm())
    plt.colorbar()
    
    #catD[key].value_counts(normalize=True).hist(ax=axes[0]); axes[0].set_title("Histogram: "+str(key))
    #catD[key].value_counts(normalize=True).hist(cumulative=True,ax=axes[1]); axes[1].set_title("Cumulative HG: "+str(key))
    i+=1



In [ ]:

    
#Iterate through each categorical column of data
#Perform a 2D histogram later

i=0    
for key in varTypes['categorical']:
    
    #print "The category is: {0} with value_counts: {1} and detailed tuple: {2} ".format(key, l.count(), l)
    plt.figure(i)
    #fig, axes = plt.subplots(nrows = 1, ncols = 2)
    #catD[key].value_counts(normalize=True).hist(ax=axes[0]); axes[0].set_title("Histogram: "+str(key))
    #catD[key].value_counts(normalize=True).hist(cumulative=True,ax=axes[1]); axes[1].set_title("Cumulative HG: "+str(key))
    if df[key].dtype in (np.int64, np.float, float, int):
        #(1.*df[key].value_counts()/len(df[key])).hist()
        df[key].value_counts(normalize=True).plot(kind='bar')
    
    i+=1



In [1]:

    
df.loc('Product_Info_1')









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-a71792ee057d> in <module>()
----> 1 df.loc('Product_Info_1')

NameError: name 'df' is not defined



In [6]:









    



hist_BMI.png
hist_Ins_Age.png
hist_norm_Product_Info_1.png
hist_norm_Product_Info_3.png
hist_norm_Product_Info_5.png
hist_norm_Product_Info_6.png
hist_norm_Product_Info_7.png
hist_norm_Response.png
hist_product_info_1.png
hist_Product_Info_2.png
hist_Product_Info_3.png
hist_product_info_4.png
hist_Product_Info_5.png
hist_Product_Info_6.png
hist_Product_Info_7.png
hist_response.png
hist_Wt.png
RFC_scatter_alpha_kappa_test1.png
RFC_scatter_alpha_kappa_test2.png
scatterLasso_alpha_kappa_test1.png
scatterLasso_alpha_kappa_test2.png
scatter_alpha_kappa.png
scatter_alpha_kappa_test1.png
scatter_alpha_kappa_test2.png
subplot_demo.py

	Id	Product_Info_1	Product_Info_2	Product_Info_3	Product_Info_4	Product_Info_5	Product_Info_6	Product_Info_7	Ins_Age	Ht	...	Medical_Keyword_41	Medical_Keyword_42	Medical_Keyword_43	Medical_Keyword_44	Medical_Keyword_45	Medical_Keyword_46	Medical_Keyword_47	Medical_Keyword_48	Response	Medical_Keyword_Sum
0	2	1	D3	10	0.076923	2	1	1	0.641791	0.581818	...	0	0	0	0	0	0	0	0	8	0
1	5	1	A1	26	0.076923	2	3	1	0.059701	0.600000	...	0	0	0	0	0	0	0	0	4	0
2	6	1	E1	26	0.076923	2	3	1	0.029851	0.745455	...	0	0	0	0	0	0	0	0	8	0
3	7	1	D4	10	0.487179	2	3	1	0.164179	0.672727	...	0	0	0	0	0	0	0	0	8	1
4	8	1	D2	26	0.230769	2	3	1	0.417910	0.654545	...	0	0	0	0	0	0	0	0	8	0
5	10	1	D2	26	0.230769	3	1	1	0.507463	0.836364	...	0	0	0	0	0	0	0	0	8	2
6	11	1	A8	10	0.166194	2	3	1	0.373134	0.581818	...	0	0	0	0	0	0	0	0	8	0
7	14	1	D2	26	0.076923	2	3	1	0.611940	0.781818	...	0	0	0	0	0	0	0	0	1	0
8	15	1	D3	26	0.230769	2	3	1	0.522388	0.618182	...	0	0	0	0	0	0	0	0	8	1
9	16	1	E1	21	0.076923	2	3	1	0.552239	0.600000	...	0	0	0	0	0	0	0	0	1	2
10	17	1	D3	26	0.128205	2	3	1	0.537313	0.690909	...	0	0	0	1	0	0	1	1	6	4
11	18	1	D4	26	0.230769	2	3	1	0.298507	0.690909	...	0	0	0	0	0	0	0	0	2	1
12	19	1	A2	26	0.102564	2	3	1	0.567164	0.618182	...	0	0	0	0	0	0	0	0	7	1
13	20	2	D1	26	0.487179	2	3	1	0.223881	0.781818	...	0	0	0	0	0	0	0	0	3	1
14	22	1	D4	26	0.487179	2	3	1	0.328358	0.636364	...	0	0	0	0	0	0	0	0	8	2
15	23	1	A7	26	0.000000	2	3	1	0.626866	0.672727	...	0	0	0	0	0	0	0	0	5	3
16	24	2	D4	26	0.487179	2	3	1	0.208955	0.745455	...	0	0	0	0	0	0	0	0	8	1
17	25	1	D3	26	0.384615	2	3	1	0.268657	0.636364	...	0	0	0	0	0	0	0	0	7	0
18	26	1	D3	26	0.076923	2	3	1	0.388060	0.781818	...	0	0	0	0	0	0	0	0	2	1
19	27	1	D4	26	0.487179	2	3	1	0.223881	0.600000	...	0	0	0	0	0	0	0	0	8	0
20	29	1	D2	26	0.435897	2	3	1	0.388060	0.745455	...	0	0	0	0	0	0	0	0	8	0
21	31	1	A1	26	1.000000	2	1	1	0.537313	0.709091	...	0	0	0	0	0	0	0	0	5	0
22	32	1	D4	26	0.230769	2	3	1	0.179104	0.800000	...	0	0	0	0	0	0	0	0	5	0
23	33	1	A2	26	0.179487	2	3	1	0.164179	0.745455	...	0	0	0	0	0	0	0	0	8	0
24	34	1	D1	26	0.487179	2	1	1	0.164179	0.818182	...	0	0	0	0	0	0	0	0	6	0
25	35	1	A6	26	0.230769	2	3	1	0.268657	0.781818	...	0	0	0	0	0	0	0	0	8	0
26	37	1	A1	26	1.000000	2	3	1	0.507463	0.654545	...	0	0	0	0	0	0	0	0	6	1
27	39	1	D3	26	0.230769	2	3	1	0.134328	0.763636	...	0	0	0	0	0	0	0	0	8	0
28	40	1	D4	26	0.487179	2	3	1	0.492537	0.618182	...	0	0	0	0	0	0	0	0	7	2
29	41	1	D3	26	1.000000	2	3	1	0.582090	0.654545	...	0	0	0	0	0	0	0	0	6	2
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
59351	79115	1	A7	26	0.000000	2	3	1	0.134328	0.781818	...	0	0	0	0	0	0	0	0	5	0
59352	79116	1	D2	10	0.230769	2	3	1	0.358209	0.618182	...	0	0	0	0	0	0	0	0	6	1
59353	79117	1	D4	26	0.589744	2	1	1	0.179104	0.781818	...	0	0	0	0	0	0	0	0	6	3
59354	79118	1	D2	26	0.487179	2	1	1	0.402985	0.763636	...	0	0	0	0	0	0	0	0	6	1
59355	79119	1	D3	26	0.230769	2	3	1	0.223881	0.745455	...	0	1	0	0	0	0	0	0	6	1
59356	79120	1	D3	10	0.076923	2	3	1	0.522388	0.600000	...	0	0	0	0	0	0	0	0	6	2
59357	79121	1	D3	26	1.000000	2	1	3	0.582090	0.781818	...	0	0	0	0	0	0	0	0	6	0
59358	79122	1	D4	26	0.282051	2	3	1	0.238806	0.727273	...	0	0	0	0	0	0	0	0	6	0
59359	79123	1	D3	26	0.230769	2	3	1	0.447761	0.781818	...	0	0	0	1	0	0	0	0	6	1
59360	79124	1	D4	26	1.000000	2	3	1	0.194030	0.654545	...	0	0	0	0	0	0	0	0	5	1
59361	79126	1	A1	26	0.230769	2	1	1	0.268657	0.727273	...	0	0	0	0	0	0	0	0	2	0
59362	79127	1	D4	26	0.230769	2	3	1	0.253731	0.781818	...	0	0	0	0	0	0	0	0	7	0
59363	79128	1	D2	4	0.076923	2	3	1	0.746269	0.563636	...	0	0	0	0	0	0	0	0	6	0
59364	79130	1	D2	26	0.076923	2	3	1	0.552239	0.727273	...	0	0	0	0	0	0	0	0	1	1
59365	79131	1	D1	29	0.076923	2	3	1	0.641791	0.709091	...	0	0	0	0	0	0	0	0	5	0
59366	79132	1	D1	26	0.282051	2	3	1	0.582090	0.781818	...	0	0	0	0	0	0	0	0	8	2
59367	79133	1	E1	26	0.179487	2	3	3	0.373134	0.600000	...	0	0	0	0	0	0	0	0	6	1
59368	79134	1	D4	26	0.230769	2	1	1	0.417910	0.727273	...	0	0	0	0	0	0	0	0	8	0
59369	79135	1	D1	26	0.179487	2	3	1	0.611940	0.745455	...	0	0	0	0	0	0	0	1	2	6
59370	79136	1	D3	26	0.230769	2	3	1	0.238806	0.763636	...	0	0	0	0	0	0	0	0	4	0
59371	79137	1	D3	26	0.487179	2	1	1	0.537313	0.709091	...	0	0	0	0	0	0	0	0	6	1
59372	79138	1	D3	26	0.487179	2	3	1	0.477612	0.763636	...	0	1	0	0	0	0	0	1	2	4
59373	79139	2	D4	29	0.487179	2	3	1	0.208955	0.800000	...	0	0	0	0	0	0	0	0	8	0
59374	79140	1	D4	26	0.307692	2	3	1	0.164179	0.690909	...	0	0	0	0	0	0	0	0	7	0
59375	79141	1	C1	26	0.076923	2	3	1	0.477612	0.654545	...	0	0	0	0	0	0	0	0	8	1
59376	79142	1	D1	10	0.230769	2	3	1	0.074627	0.709091	...	0	0	0	0	0	0	0	0	4	0
59377	79143	1	D3	26	0.230769	2	3	1	0.432836	0.800000	...	0	0	0	0	0	0	0	0	7	0
59378	79144	1	E1	26	0.076923	2	3	1	0.104478	0.745455	...	0	0	0	0	0	0	0	0	8	1
59379	79145	1	D2	10	0.230769	2	3	1	0.507463	0.690909	...	0	0	0	0	0	0	0	0	8	2
59380	79146	1	A8	26	0.076923	2	3	1	0.447761	0.781818	...	0	0	0	0	0	0	0	0	7	0

	Response	Medical_Keyword_Sum	Product_Info_1	Product_Info_2	Product_Info_3	Product_Info_5	Product_Info_6	Product_Info_7	Employment_Info_2	Employment_Info_3	...	Wt	BMI	Employment_Info_1	Employment_Info_4	Employment_Info_6	Insurance_History_5	Family_Hist_2	Family_Hist_3	Family_Hist_4	Family_Hist_5
0	8	0	1	17	10	2	1	1	12	1	...	0.148536	0.323008	0.0280	0.00000	-1.0000	0.000667	-1.000000	0.598039	-1.000000	0.526786
1	4	0	1	1	26	2	3	1	1	3	...	0.131799	0.272288	0.0000	0.00000	0.0018	0.000133	0.188406	-1.000000	0.084507	-1.000000
2	8	0	1	19	26	2	3	1	9	1	...	0.288703	0.428780	0.0300	0.00000	0.0300	-1.000000	0.304348	-1.000000	0.225352	-1.000000
3	8	1	1	18	10	2	3	1	9	1	...	0.205021	0.352438	0.0420	0.00000	0.2000	-1.000000	0.420290	-1.000000	0.352113	-1.000000
4	8	0	1	16	26	2	3	1	9	1	...	0.234310	0.424046	0.0270	0.00000	0.0500	-1.000000	0.463768	-1.000000	0.408451	-1.000000
5	8	2	1	16	26	3	1	1	15	1	...	0.299163	0.364887	0.3250	0.00000	1.0000	0.005000	-1.000000	0.294118	0.507042	-1.000000
6	8	0	1	8	10	2	3	1	1	3	...	0.173640	0.376587	0.1100	-1.00000	0.8000	0.001667	0.594203	-1.000000	0.549296	-1.000000
7	1	0	1	16	26	2	3	1	12	1	...	0.403766	0.571612	0.1200	0.00000	1.0000	0.000667	-1.000000	0.490196	-1.000000	0.633929
8	8	1	1	17	26	2	3	1	9	1	...	0.184100	0.362643	0.1650	0.00000	1.0000	0.007613	-1.000000	0.529412	0.676056	-1.000000
9	1	2	1	19	21	2	3	1	1	3	...	0.284519	0.587796	0.0250	0.00000	0.0500	0.000667	0.797101	-1.000000	-1.000000	0.553571
10	6	4	1	17	26	2	3	1	9	1	...	0.309623	0.521668	0.0500	-1.00000	0.1500	0.000587	-1.000000	0.470588	0.647887	-1.000000
11	2	1	1	18	26	2	3	1	3	1	...	0.271967	0.455050	0.0900	-1.00000	1.0000	-1.000000	0.405797	-1.000000	0.352113	-1.000000
12	7	1	1	2	26	2	3	1	9	1	...	0.163180	0.320784	0.0750	0.00000	-1.0000	0.000667	-1.000000	0.549020	-1.000000	0.482143
13	3	1	2	15	26	2	3	1	9	1	...	0.361925	0.507515	0.1000	-1.00000	0.0750	-1.000000	0.420290	-1.000000	0.338028	-1.000000
14	8	2	1	18	26	2	3	1	3	1	...	0.142259	0.264648	0.1600	0.00000	0.6000	0.004000	-1.000000	0.578431	0.535211	-1.000000
15	5	3	1	7	26	2	3	1	9	1	...	0.330544	0.581279	0.0750	0.00000	-1.0000	0.000480	-1.000000	0.549020	-1.000000	0.535714
16	8	1	2	18	26	2	3	1	14	1	...	0.246862	0.360969	0.1000	0.00000	0.2500	-1.000000	0.275362	-1.000000	0.253521	-1.000000
17	7	0	1	17	26	2	3	1	9	1	...	0.228033	0.430949	0.0378	0.00000	0.0360	-1.000000	-1.000000	0.343137	0.436620	-1.000000
18	2	1	1	17	26	2	3	1	9	1	...	0.309623	0.427394	0.0800	0.00000	-1.0000	0.000400	-1.000000	0.509804	0.507042	-1.000000
19	8	0	1	18	26	2	3	1	9	1	...	0.138075	0.285254	0.0550	0.00000	0.0000	-1.000000	0.289855	-1.000000	0.281690	-1.000000
20	8	0	1	16	26	2	3	1	9	1	...	0.246862	0.360969	0.0830	0.00000	0.5000	0.001107	-1.000000	0.509804	0.478873	-1.000000
21	5	0	1	1	26	2	1	1	3	1	...	0.370293	0.605334	0.2100	0.00000	1.0000	-1.000000	-1.000000	0.421569	-1.000000	0.544643
22	5	0	1	18	26	2	3	1	9	1	...	0.539749	0.753765	0.0310	0.00000	0.0000	-1.000000	0.434783	-1.000000	0.394366	-1.000000
23	8	0	1	2	26	2	3	1	9	1	...	0.288703	0.428780	0.0650	0.00000	0.3500	0.003333	-1.000000	0.313725	0.281690	-1.000000
24	6	0	1	15	26	2	1	1	9	1	...	0.435146	0.576961	0.0270	0.00000	0.1500	0.000133	0.376812	-1.000000	0.253521	-1.000000
25	8	0	1	6	26	2	3	1	12	1	...	0.368201	0.517129	0.1000	0.00000	0.3500	-1.000000	-1.000000	0.441176	0.394366	-1.000000
26	6	1	1	1	26	2	3	1	12	1	...	0.299163	0.545946	0.1500	0.00000	1.0000	0.002333	-1.000000	-1.000000	-1.000000	-1.000000
27	8	0	1	17	26	2	3	1	9	1	...	0.215481	0.296359	0.0420	0.00000	-1.0000	0.000667	-1.000000	-1.000000	-1.000000	-1.000000
28	7	2	1	18	26	2	3	1	9	1	...	0.276151	0.546823	0.1200	0.00000	0.1200	0.001000	-1.000000	-1.000000	0.450704	-1.000000
29	6	2	1	17	26	2	3	1	9	1	...	0.278243	0.506623	0.1150	0.00000	1.0000	0.004720	-1.000000	0.529412	0.661972	-1.000000
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
59351	5	0	1	7	26	2	3	1	12	1	...	0.351464	0.491491	0.0700	0.00000	0.0700	-1.000000	-1.000000	0.254902	-1.000000	0.339286
59352	6	1	1	16	10	2	3	1	12	1	...	0.246862	0.488220	0.0200	0.00000	0.1000	-1.000000	-1.000000	-1.000000	0.408451	-1.000000
59353	6	3	1	18	26	2	1	1	9	1	...	0.382845	0.539563	0.0800	0.00000	0.0000	-1.000000	0.275362	-1.000000	0.239437	-1.000000
59354	6	1	1	16	26	2	1	1	12	1	...	0.341004	0.494104	0.0500	0.02000	0.2500	0.006667	0.608696	-1.000000	0.507042	-1.000000
59355	6	1	1	17	26	2	3	1	9	1	...	0.361925	0.547451	0.0580	0.00000	0.2250	0.000333	0.405797	-1.000000	0.323944	-1.000000
59356	6	2	1	17	10	2	3	1	9	1	...	0.299163	0.618050	0.0400	0.00000	-1.0000	0.000067	-1.000000	0.598039	0.591549	-1.000000
59357	6	0	1	17	26	2	1	3	9	1	...	0.351464	0.491491	0.1250	0.02500	1.0000	-1.000000	-1.000000	0.617647	-1.000000	0.625000
59358	6	0	1	18	26	2	3	1	9	1	...	0.372385	0.586182	0.0900	-1.00000	0.0050	-1.000000	0.275362	-1.000000	0.281690	-1.000000
59359	6	1	1	17	26	2	3	1	14	1	...	0.424686	0.603660	0.0600	0.00750	0.2250	0.002067	-1.000000	0.568627	0.619718	-1.000000
59360	5	1	1	18	26	2	3	1	9	1	...	0.146444	0.258890	0.0800	0.00000	0.1500	-1.000000	0.304348	-1.000000	0.267606	-1.000000
59361	2	0	1	1	26	2	1	1	3	1	...	0.267782	0.411703	0.2500	0.00000	0.9000	-1.000000	0.478261	-1.000000	0.267606	-1.000000
59362	7	0	1	18	26	2	3	1	9	1	...	0.351464	0.491491	0.0540	0.00000	0.0250	0.000667	0.449275	-1.000000	0.408451	-1.000000
59363	6	0	1	16	4	2	3	1	1	3	...	0.205021	0.464570	0.0000	0.00000	0.0000	-1.000000	-1.000000	0.519608	-1.000000	-1.000000
59364	1	1	1	16	26	2	3	1	12	1	...	0.177824	0.261651	0.0500	0.00000	0.0000	-1.000000	-1.000000	0.598039	-1.000000	0.517857
59365	5	0	1	15	29	2	3	1	9	1	...	0.284519	0.458023	0.0450	0.00000	0.2000	-1.000000	-1.000000	0.578431	-1.000000	0.535714
59366	8	2	1	15	26	2	3	1	14	1	...	0.320084	0.443418	0.0580	0.00000	0.3000	0.002000	-1.000000	0.480392	0.704225	-1.000000
59367	6	1	1	19	26	2	3	3	9	1	...	0.320084	0.661270	0.1600	0.00000	0.0000	0.005667	-1.000000	0.568627	-1.000000	0.491071
59368	8	0	1	18	26	2	1	1	9	1	...	0.299163	0.464047	0.0900	0.00000	0.2000	0.000733	-1.000000	0.568627	0.436620	-1.000000
59369	2	6	1	15	26	2	3	1	1	3	...	0.451883	0.693246	0.0920	-1.00000	0.1600	-1.000000	-1.000000	0.627451	-1.000000	0.348214
59370	4	0	1	17	26	2	3	1	9	1	...	0.330544	0.477625	0.0650	0.00000	0.0500	0.001333	-1.000000	0.078431	-1.000000	0.348214
59371	6	1	1	17	26	2	1	1	12	1	...	0.343096	0.558626	0.0650	0.00000	-1.0000	-1.000000	-1.000000	0.519608	0.661972	-1.000000
59372	2	4	1	17	26	2	3	1	14	1	...	0.305439	0.438076	0.2000	0.00000	0.3000	-1.000000	0.681159	-1.000000	0.605634	-1.000000
59373	8	0	2	18	29	2	3	1	9	1	...	0.257322	0.332885	0.0320	0.00000	-1.0000	-1.000000	0.275362	-1.000000	0.295775	-1.000000
59374	7	0	1	18	26	2	3	1	9	1	...	0.288703	0.484658	0.0590	0.00000	0.0200	-1.000000	0.405797	-1.000000	0.295775	-1.000000
59375	8	1	1	11	26	2	3	1	9	1	...	0.271967	0.494827	0.0450	0.00000	0.0450	-1.000000	-1.000000	-1.000000	-1.000000	-1.000000
59376	4	0	1	15	10	2	3	1	1	3	...	0.320084	0.519103	0.0200	0.00000	0.0250	-1.000000	0.217391	-1.000000	0.197183	-1.000000
59377	7	0	1	17	26	2	3	1	9	1	...	0.403766	0.551119	0.1000	0.00001	0.3500	0.000267	0.565217	-1.000000	0.478873	-1.000000
59378	8	1	1	19	26	2	3	1	9	1	...	0.246862	0.360969	0.0350	0.00000	-1.0000	-1.000000	0.173913	-1.000000	0.126761	-1.000000
59379	8	2	1	16	10	2	3	1	9	1	...	0.276151	0.462452	0.0380	-1.00000	-1.0000	-1.000000	-1.000000	0.372549	0.704225	-1.000000
59380	7	0	1	8	26	2	3	1	9	1	...	0.382845	0.539563	0.1230	-1.00000	0.3000	-1.000000	-1.000000	0.401961	-1.000000	0.589286