In [420]:

    
%matplotlib inline

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import mutual_info_classif
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from scipy.stats import ttest_ind

sns.set_style('white')

Prediction

For investors it's interesting to know which characteristics of a loan are predictive of a loan ending in charged off. Lending club has its own algorithms beforehand that they use to predict which loans are riskier and give these a grade (A-F). This correlates well with the probability of charged off as we saw in the exploration of the dataset. The interest rates should reflect the risk (higher interest with more risk) to make the riskier loans still attractive to invest in. Although grade and interest correlates well, it's not a perfect correlation.

We will here use the loans that went to full term to build classifiers that can classify loans into charged off and fully paid. The accuracy measure used is 'f1_weighted' of sklearn. This score can be interpreted as a weighted average of the precision and recall. Also confusion matrices and ROC curves will be used for analysis. Grade is used as baseline prediction for charged off/fully paid. We will look for features that add extra predictive value on top of the grade feature and see if this gives us any insight.

Select loans and features

We selected the loans here that went to full term and add the characteristic whether they were charged off or not. We excluded the one loan that was a joint application. The number of loans left are 255,719. And the percentage of charged_off loans is 18%.



In [341]:

    
loans = pd.read_csv('../data/loan.csv')
closed_loans = loans[loans['loan_status'].isin(['Fully Paid', 'Charged Off'])]
print(closed_loans.shape)
round(sum(closed_loans['loan_status']=='Charged Off')/len(closed_loans['loan_status'])*100)









    



(252971, 74)






    



/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2902: DtypeWarning: Columns (19,55) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)






    Out[341]:





18.0

Select features

We selected features that can be included for the prediction. For this we left out features that are not known at the beginning, like 'total payment'. Because these are not useful features to help new investors. Also non-predictive features like 'id' or features that have all the same values are also excluded. All features to do with 'joint' loans are also excluded, since we do not have joint loans. We did add loan_status and charged_off for the prediction. Furthermore, features that were missing in more than 5% of the loans were excluded leaving 24 features (excluding the targets loan_status and charged_off).



In [342]:

    
include = ['term', 'int_rate', 'installment', 'grade', 'sub_grade', 'emp_length', 'home_ownership', 
          'annual_inc', 'purpose', 'zip_code', 'addr_state', 'delinq_2yrs', 'earliest_cr_line', 'inq_last_6mths', 
          'mths_since_last_delinq', 'mths_since_last_record', 'open_acc', 'pub_rec', 'revol_bal', 'revol_util', 'total_acc', 
          'mths_since_last_major_derog', 'acc_now_delinq', 'loan_amnt', 'open_il_6m', 'open_il_12m', 
          'open_il_24m', 'mths_since_rcnt_il', 'total_bal_il', 'dti', 'open_acc_6m', 'tot_cur_bal',
          'il_util', 'open_rv_12m', 'open_rv_24m', 'max_bal_bc', 'all_util', 'total_rev_hi_lim', 'inq_fi', 'total_cu_tl',
          'inq_last_12m', 'issue_d', 'loan_status']

exclude = ['funded_amnt', 'funded_amnt_inv', 'verfication_status', 'total_pymnt_inv', 'total_rec_prncp', 'total_rec_int', 'total_rec_late_fee',
           'recoveries', 'collection_recovery_fee', 'last_pymnt_d', 'last_credit_pull_d', 'collections_12_mths_ex_med', 
           'initial_list_status', 'id', 'member_id', 'emp_title', 'pymnt_plan', 'url', 'desc', 'title', 
           'out_prncp', 'out_prncp_inv', 'total_pymnt', 'last_pymnt_amnt', 'next_pymnt_d', 'policy_code', 
           'application_type', 'annual_inc_joint', 'dti_joint', 'verification_status_joint', 'tot_coll_amt',
           ]


# exclude the one joint application
closed_loans = closed_loans[closed_loans['application_type'] == 'INDIVIDUAL']

# make id index
closed_loans.index = closed_loans.id

# include only the features above
closed_loans = closed_loans[include]

# exclude features with more than 5% missing values
columns_not_missing = (closed_loans.isnull().apply(sum, 0) / len(closed_loans)) < 0.1
closed_loans = closed_loans.loc[:,columns_not_missing[columns_not_missing].index]

# delete rows with NANs
print(1 - closed_loans.dropna().shape[0] / closed_loans.shape[0]) # ratio deleted rows
closed_loans = closed_loans.dropna()

# calculate nr of days between earliest creditline and issue date of the loan
# delete the two original features
closed_loans['earliest_cr_line'] = pd.to_datetime(closed_loans['earliest_cr_line'])
closed_loans['issue_d'] = pd.to_datetime(closed_loans['issue_d'])
closed_loans['days_since_first_credit_line'] = closed_loans['issue_d'] - closed_loans['earliest_cr_line']
closed_loans['days_since_first_credit_line'] = closed_loans['days_since_first_credit_line'] / np.timedelta64(1, 'D')
closed_loans = closed_loans.drop(['earliest_cr_line', 'issue_d'], axis=1)

# delete redundant features
#closed_loans = closed_loans.drop(['grade'], axis=1)

# round-up annual_inc and cut-off outliers annual_inc at 200.000
closed_loans['annual_inc'] = np.ceil(closed_loans['annual_inc'] / 1000)
closed_loans.loc[closed_loans['annual_inc'] > 200, 'annual_inc'] = 200

closed_loans.shape









    



0.0007866545440170514






    Out[342]:





(252771, 23)



In [343]:

    
closed_loans.head()









    Out[343]:






  
    
      
      term
      int_rate
      installment
      grade
      sub_grade
      emp_length
      home_ownership
      annual_inc
      purpose
      zip_code
      ...
      open_acc
      pub_rec
      revol_bal
      revol_util
      total_acc
      acc_now_delinq
      loan_amnt
      dti
      loan_status
      days_since_first_credit_line
    
    
      id
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1077501
      36 months
      10.65
      162.87
      B
      B2
      10+ years
      RENT
      24.0
      credit_card
      860xx
      ...
      3.0
      0.0
      13648.0
      83.7
      9.0
      0.0
      5000.0
      27.65
      Fully Paid
      9830.0
    
    
      1077430
      60 months
      15.27
      59.83
      C
      C4
      < 1 year
      RENT
      30.0
      car
      309xx
      ...
      3.0
      0.0
      1687.0
      9.4
      4.0
      0.0
      2500.0
      1.00
      Charged Off
      4627.0
    
    
      1077175
      36 months
      15.96
      84.33
      C
      C5
      10+ years
      RENT
      13.0
      small_business
      606xx
      ...
      2.0
      0.0
      2956.0
      98.5
      10.0
      0.0
      2400.0
      8.72
      Fully Paid
      3682.0
    
    
      1076863
      36 months
      13.49
      339.31
      C
      C1
      10+ years
      RENT
      50.0
      other
      917xx
      ...
      10.0
      0.0
      5598.0
      21.0
      37.0
      0.0
      10000.0
      20.00
      Fully Paid
      5782.0
    
    
      1075269
      36 months
      7.90
      156.46
      A
      A4
      3 years
      RENT
      36.0
      wedding
      852xx
      ...
      9.0
      0.0
      7963.0
      28.3
      12.0
      0.0
      5000.0
      11.20
      Fully Paid
      2586.0
    
  

5 rows × 23 columns



In [229]:

    
closed_loans.columns









    Out[229]:





Index(['term', 'int_rate', 'installment', 'grade', 'sub_grade', 'emp_length',
       'home_ownership', 'annual_inc', 'purpose', 'zip_code', 'addr_state',
       'delinq_2yrs', 'inq_last_6mths', 'open_acc', 'pub_rec', 'revol_bal',
       'revol_util', 'total_acc', 'acc_now_delinq', 'loan_amnt', 'dti',
       'loan_status', 'days_since_first_credit_line'],
      dtype='object')



In [230]:

    
plt.hist(closed_loans['annual_inc'], bins=100)









    Out[230]:





(array([  2.00000000e+00,   1.60000000e+01,   3.50000000e+01,
          1.80000000e+02,   3.33000000e+02,   3.36000000e+02,
          7.76000000e+02,   8.31000000e+02,   1.43700000e+03,
          1.42800000e+03,   2.16300000e+03,   3.31500000e+03,
          2.76100000e+03,   5.69800000e+03,   3.77600000e+03,
          3.77500000e+03,   7.98500000e+03,   4.74700000e+03,
          9.69400000e+03,   6.15900000e+03,   4.64200000e+03,
          9.43800000e+03,   6.27400000e+03,   1.11370000e+04,
          6.03300000e+03,   4.95100000e+03,   8.92800000e+03,
          4.74000000e+03,   1.15990000e+04,   4.05400000e+03,
          3.74300000e+03,   9.12000000e+03,   4.12500000e+03,
          1.26500000e+03,   8.25300000e+03,   4.72300000e+03,
          7.87700000e+03,   2.33300000e+03,   2.87100000e+03,
          7.41700000e+03,   2.74100000e+03,   6.07800000e+03,
          2.12800000e+03,   2.07000000e+03,   5.63600000e+03,
          2.29700000e+03,   3.54800000e+03,   2.03800000e+03,
          1.83700000e+03,   5.42100000e+03,   1.56300000e+03,
          2.47900000e+03,   1.02300000e+03,   1.09000000e+03,
          3.36700000e+03,   8.15000000e+02,   1.79100000e+03,
          7.02000000e+02,   6.23000000e+02,   3.63800000e+03,
          5.06000000e+02,   2.09600000e+03,   4.44000000e+02,
          4.32000000e+02,   1.82700000e+03,   4.29000000e+02,
          1.25000000e+02,   1.04100000e+03,   3.42000000e+02,
          1.48100000e+03,   2.52000000e+02,   3.37000000e+02,
          7.45000000e+02,   2.48000000e+02,   2.07400000e+03,
          1.70000000e+02,   1.98000000e+02,   4.93000000e+02,
          1.65000000e+02,   8.59000000e+02,   9.40000000e+01,
          1.03000000e+02,   4.88000000e+02,   1.31000000e+02,
          5.18000000e+02,   1.12000000e+02,   7.00000000e+01,
          5.67000000e+02,   9.70000000e+01,   6.47000000e+02,
          6.70000000e+01,   8.10000000e+01,   2.95000000e+02,
          7.00000000e+01,   2.67000000e+02,   6.70000000e+01,
          4.10000000e+01,   1.47000000e+02,   6.30000000e+01,
          4.72700000e+03]),
 array([   3.  ,    4.97,    6.94,    8.91,   10.88,   12.85,   14.82,
          16.79,   18.76,   20.73,   22.7 ,   24.67,   26.64,   28.61,
          30.58,   32.55,   34.52,   36.49,   38.46,   40.43,   42.4 ,
          44.37,   46.34,   48.31,   50.28,   52.25,   54.22,   56.19,
          58.16,   60.13,   62.1 ,   64.07,   66.04,   68.01,   69.98,
          71.95,   73.92,   75.89,   77.86,   79.83,   81.8 ,   83.77,
          85.74,   87.71,   89.68,   91.65,   93.62,   95.59,   97.56,
          99.53,  101.5 ,  103.47,  105.44,  107.41,  109.38,  111.35,
         113.32,  115.29,  117.26,  119.23,  121.2 ,  123.17,  125.14,
         127.11,  129.08,  131.05,  133.02,  134.99,  136.96,  138.93,
         140.9 ,  142.87,  144.84,  146.81,  148.78,  150.75,  152.72,
         154.69,  156.66,  158.63,  160.6 ,  162.57,  164.54,  166.51,
         168.48,  170.45,  172.42,  174.39,  176.36,  178.33,  180.3 ,
         182.27,  184.24,  186.21,  188.18,  190.15,  192.12,  194.09,
         196.06,  198.03,  200.  ]),
 <a list of 100 Patch objects>)

Split data

We keep 30% of the data separate for now so we can later use this to reliable test the performance of the classifier. The split is stratified by 'loan_status' in order to equally divide old loans over the split (old loans have a higher 'charged_off' probability). The classes to predict are in the variable 'charged_off'.



In [237]:

    
X_train, X_test, y_train, y_test = train_test_split(closed_loans, closed_loans['loan_status'], 
                                                    test_size=0.3, random_state=123)
X_train = X_train.drop('loan_status', axis=1)
X_test = X_test.drop('loan_status', axis=1)

Logistic regression

We will first start with the logistic regression classifier. This is a simple classifier that uses a sigmoidal curve to predict from the features to which class the sample belongs. It has one parameter to tune namely the C-parameter. This is the inverse of the regularization strength, smaller values specify stronger regularization. In sklearn the features have to be numerical that we input in this algorithm, so we need to convert the categorical features to numeric. To do this ordered categorical features will have adjacent numbers and unordered features will get an order as best as possible during conversion to numeric, for instance geographical. Also there cannot be nan/inf/-inf values, hence these will be made 0's. With this algorithm we will also have to scale and normalize the features.

Non-numeric features were converted as follows:

earliest_cr_line: the date was converted to a timestamp number
grade/sub_grade: order of the letters was kept
emp_length: nr of years
zipcode: numbers kept of zipcode (geographical order)
term: in months
home_ownership: from none to rent to mortgage to owned
purpose: from purposes that might make money to purposes that only cost money
addr_state: ordered geographically from west to east, top to bottom (https://theusa.nl/staten/)



In [239]:

    
# features that are not float or int, so not to be converted:

# ordered:
# sub_grade, emp_length, zip_code, term

# unordered:
# home_ownership, purpose, addr_state (ordered geographically)

# term
X_train['term'] = X_train['term'].apply(lambda x: int(x.split(' ')[1]))

# grade
loans['grade'] = loans['grade'].astype('category')
grade_dict = {'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6, 'G': 7}
X_train['grade'] = X_train['grade'].apply(lambda x: grade_dict[x])

# emp_length
emp_length_dict = {'n/a':0,
                   '< 1 year':0,
                   '1 year':1,
                   '2 years':2,
                   '3 years':3,
                   '4 years':4,
                   '5 years':5,
                   '6 years':6,
                   '7 years':7,
                   '8 years':8,
                   '9 years':9,
                   '10+ years':10}
X_train['emp_length'] = X_train['emp_length'].apply(lambda x: emp_length_dict[x])

# zipcode
X_train['zip_code'] = X_train['zip_code'].apply(lambda x: int(x[0:3]))

# subgrade
X_train['sub_grade'] = X_train['grade'] + X_train['sub_grade'].apply(lambda x: float(list(x)[1])/10)

# house
house_dict = {'NONE': 0, 'OTHER': 0, 'ANY': 0, 'RENT': 1, 'MORTGAGE': 2, 'OWN': 3}
X_train['home_ownership'] = X_train['home_ownership'].apply(lambda x: house_dict[x])

# purpose
purpose_dict = {'other': 0, 'small_business': 1, 'renewable_energy': 2, 'home_improvement': 3,
                'house': 4, 'educational': 5, 'medical': 6, 'moving': 7, 'car': 8, 
                'major_purchase': 9, 'wedding': 10, 'vacation': 11, 'credit_card': 12, 
                'debt_consolidation': 13}
X_train['purpose'] = X_train['purpose'].apply(lambda x: purpose_dict[x])

# states
state_dict = {'AK': 0, 'WA': 1, 'ID': 2, 'MT': 3, 'ND': 4, 'MN': 5, 
              'OR': 6, 'WY': 7, 'SD': 8, 'WI': 9, 'MI': 10, 'NY': 11, 
              'VT': 12, 'NH': 13, 'MA': 14, 'CT': 15, 'RI': 16, 'ME': 17,
              'CA': 18, 'NV': 19, 'UT': 20, 'CO': 21, 'NE': 22, 'IA': 23, 
              'KS': 24, 'MO': 25, 'IL': 26, 'IN': 27, 'OH': 28, 'PA': 29, 
              'NJ': 30, 'KY': 31, 'WV': 32, 'VA': 33, 'DC': 34, 'MD': 35, 
              'DE': 36, 'AZ': 37, 'NM': 38, 'OK': 39, 'AR': 40, 'TN': 41, 
              'NC': 42, 'TX': 43, 'LA': 44, 'MS': 45, 'AL': 46, 'GA': 47, 
              'SC': 48, 'FL': 49, 'HI': 50}
X_train['addr_state'] = X_train['addr_state'].apply(lambda x: state_dict[x])

# make NA's, inf and -inf 0
X_train = X_train.fillna(0)
X_train = X_train.replace([np.inf, -np.inf], 0)



In [240]:

    
X_train.columns









    Out[240]:





Index(['term', 'int_rate', 'installment', 'grade', 'sub_grade', 'emp_length',
       'home_ownership', 'annual_inc', 'purpose', 'zip_code', 'addr_state',
       'delinq_2yrs', 'inq_last_6mths', 'open_acc', 'pub_rec', 'revol_bal',
       'revol_util', 'total_acc', 'acc_now_delinq', 'loan_amnt', 'dti',
       'days_since_first_credit_line'],
      dtype='object')



In [241]:

    
# scaling and normalizing the features
X_train_scaled = preprocessing.scale(X_train)
scaler = preprocessing.StandardScaler().fit(X_train)
X_train_scaled = pd.DataFrame(X_train_scaled, columns=X_train.columns)

After the categorical features are conversed to numeric an normalized/scaled, we will first check what the accuracy is when only using the feature 'grade' (A-F) to predict 'charged off' (True/False). This is the classification lending club gave the loans. The closer to F the higher the chance the loan will end in 'charged off'. For the accuracy estimation we will use 'F1-weighted'. This stands for F1 = 2 (precision recall) / (precision + recall). In this way both precision and recall is important for the accuracy. Precision is the number of correct positive results divided by the number of all positive results, and recall is the number of correct positive results divided by the number of positive results that should have been returned. The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst at 0. In this case using only 'grade' as feature, using the default parameter value for C (inverse of regularization strength) and using l1/lasso penalization we get an F1-accuracy of 0.744.



In [242]:

    
clf = LogisticRegression(penalty='l1')
scores = cross_val_score(clf, X_train_scaled.loc[:,['grade']], y_train, cv=10, scoring='f1_weighted')
print(scores)
print(np.mean(scores))









    



[ 0.7451726   0.74695486  0.74425513  0.7456573   0.74568727  0.7450578
  0.74423565  0.74611306  0.74540367  0.74482549]
0.745336282371

A score of 0.744 looks not really high but still a lot better than random. Nevertheless, if we look into the confusion matrix and the ROC-curve we see a whole other picture. It turns out the algorithm mostly predicts everything in the not charged off group and therefore gets the majority right, because there are a lot more paid loans than charged off loans (18%). The area under the curve even gives only a score of 0.506 while random is 0.5. The prediction with logistic regression and only feature grade is therefore only as good as random.



In [243]:

    
from sklearn.model_selection import cross_val_predict
from pandas_confusion import ConfusionMatrix

prediction = cross_val_predict(clf, X_train_scaled.loc[:,['grade']], y_train, cv=10)
confusion_matrix = ConfusionMatrix(y_train, prediction)
confusion_matrix.print_stats()
confusion_matrix.plot()









    



Confusion Matrix:

Predicted    Charged Off  Fully Paid  __all__
Actual                                       
Charged Off          117       14057    14174
Fully Paid           487       61295    61782
__all__              604       75352    75956


Overall Statistics:

Accuracy: 0.808520722524
95% CI: (0.80570421101228251, 0.81131364877443635)
No Information Rate: ToDo
P-Value [Acc > NIR]: 1.0
Kappa: 0.000589413953434
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                Charged Off   Fully Paid
Population                                   75956        75956
P: Condition positive                        14174        61782
N: Condition negative                        61782        14174
Test outcome positive                          604        75352
Test outcome negative                        75352          604
TP: True Positive                              117        61295
TN: True Negative                            61295          117
FP: False Positive                             487        14057
FN: False Negative                           14057          487
TPR: (Sensitivity, hit rate, recall)    0.00825455     0.992117
TNR=SPC: (Specificity)                    0.992117   0.00825455
PPV: Pos Pred Value (Precision)           0.193709     0.813449
NPV: Neg Pred Value                       0.813449     0.193709
FPR: False-out                          0.00788255     0.991745
FDR: False Discovery Rate                 0.806291     0.186551
FNR: Miss Rate                            0.991745   0.00788255
ACC: Accuracy                             0.808521     0.808521
F1 score                                 0.0158343     0.893943
MCC: Matthews correlation coefficient   0.00163173   0.00163173
Informedness                           0.000371996  0.000371996
Markedness                              0.00715749   0.00715749
Prevalence                                0.186608     0.813392
LR+: Positive likelihood ratio             1.04719      1.00038
LR-: Negative likelihood ratio            0.999625     0.954934
DOR: Diagnostic odds ratio                 1.04759      1.04759
FOR: False omission rate                  0.186551     0.806291






    Out[243]:





<matplotlib.axes._subplots.AxesSubplot at 0x114862978>



In [251]:

    
y_score = cross_val_predict(clf, X_train_scaled.loc[:,['grade']], y_train, cv=10, method='predict_proba')
fpr, tpr, thresholds = roc_curve(y_train, y_score[:,0], pos_label='Charged Off')
print(auc(fpr, tpr))
plt.plot(fpr, tpr)









    



0.662577306672






    Out[251]:





[<matplotlib.lines.Line2D at 0x13d6c5438>]

We can now include all the features we selected (24) and see if the prediction will be better. Because we use regularization, the effect of not useful features will be downgraded automatically. This leads to a slightly better F1-score of 0.751. Also the confusion matrix and the ROC-curve/AUC-score are a little better. Although still not great with an AUC score of 0.515. The top-5 features most used by the algorithm are: 'funded_amnt_inv', 'int_rate', 'sub_grade', 'funded_amnt' and 'annual_inc'. Not even grade itself.



In [256]:

    
clf = LogisticRegression(penalty='l1')
scores = cross_val_score(clf, X_train_scaled, y_train, cv=10, scoring='f1_weighted')
print(scores)  
print(np.mean(scores))









    



[ 0.75310091  0.75227635  0.75317398  0.75325794  0.75425814  0.75201058
  0.75253293  0.7538835   0.75249689  0.75422849]
0.75312197205



In [257]:

    
prediction = cross_val_predict(clf, X_train_scaled, y_train, cv=10)
confusion_matrix = ConfusionMatrix(y_train, prediction)
confusion_matrix.print_stats()
confusion_matrix.plot()









    



Confusion Matrix:

Predicted    Charged Off  Fully Paid  __all__
Actual                                       
Charged Off          195       13979    14174
Fully Paid           865       60917    61782
__all__             1060       74896    75956


Overall Statistics:

Accuracy: 0.80457106746
95% CI: (0.80173290951382237, 0.80738594264111385)
No Information Rate: ToDo
P-Value [Acc > NIR]: 1.0
Kappa: -0.000378008425893
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                Charged Off   Fully Paid
Population                                   75956        75956
P: Condition positive                        14174        61782
N: Condition negative                        61782        14174
Test outcome positive                         1060        74896
Test outcome negative                        74896         1060
TP: True Positive                              195        60917
TN: True Negative                            60917          195
FP: False Positive                             865        13979
FN: False Negative                           13979          865
TPR: (Sensitivity, hit rate, recall)     0.0137576     0.985999
TNR=SPC: (Specificity)                    0.985999    0.0137576
PPV: Pos Pred Value (Precision)           0.183962     0.813355
NPV: Neg Pred Value                       0.813355     0.183962
FPR: False-out                           0.0140008     0.986242
FDR: False Discovery Rate                 0.816038     0.186645
FNR: Miss Rate                            0.986242    0.0140008
ACC: Accuracy                             0.804571     0.804571
F1 score                                 0.0256006     0.891394
MCC: Matthews correlation coefficient -0.000807906 -0.000807906
Informedness                          -0.000243257 -0.000243257
Markedness                             -0.00268322  -0.00268322
Prevalence                                0.186608     0.813392
LR+: Positive likelihood ratio            0.982626     0.999753
LR-: Negative likelihood ratio             1.00025      1.01768
DOR: Diagnostic odds ratio                0.982383     0.982383
FOR: False omission rate                  0.186645     0.816038






    Out[257]:





<matplotlib.axes._subplots.AxesSubplot at 0x120ecc748>



In [258]:

    
y_score = cross_val_predict(clf, X_train_scaled, y_train, cv=10, method='predict_proba')
fpr, tpr, thresholds = roc_curve(y_train, y_score[:,0], pos_label='Charged Off')
print(auc(fpr, tpr))
plt.plot(fpr, tpr)









    



0.700731760785






    Out[258]:





[<matplotlib.lines.Line2D at 0x1148a4c50>]



In [281]:

    
clf = LogisticRegression(penalty='l1', C=10)
clf.fit(X_train_scaled, y_train)
coefs = clf.coef_

# find index of top 5 highest coefficients, aka most used features for prediction
positions = abs(coefs[0]).argsort()[-5:][::-1]
print(X_train_scaled.columns[positions])
print(coefs[0][positions])









    



Index(['int_rate', 'annual_inc', 'sub_grade', 'term', 'dti'], dtype='object')
[-0.6129477   0.29842243  0.27762066 -0.17339293 -0.15911763]

We can also pick only 5 features and see if this works better. But it works exactly the same as only grade. So SelectKBest does not work as well with 5 features as using all features.



In [260]:

    
new_X = (SelectKBest(mutual_info_classif, k=5)
        .fit_transform(X_train_scaled, y_train))



In [262]:

    
print(new_X[0]) # term, int_rate, installement, grade, sub_grade
print(X_train_scaled.head())
new_X = pd.DataFrame(new_X, columns=['term', 'int_rate', 'installment', 'grade', 'sub_grade'])









    



[-0.53469047 -0.40174267  1.00571969 -0.59548115 -0.60103177]
       term  int_rate  installment     grade  sub_grade  emp_length  \
0 -0.534690 -0.401743     1.005720 -0.595481  -0.601032   -1.503256   
1  1.870241  0.865487     0.347860  0.900654   0.988284   -1.503256   
2 -0.534690 -0.092884    -0.352335  0.152587   0.080104    0.643787   
3  1.870241  0.554357    -0.710339  0.152587   0.231467   -1.503256   
4 -0.534690  0.863216    -0.427319  0.900654   0.761239   -0.698114   

   home_ownership  annual_inc   purpose  zip_code  \
0       -1.055764    0.437807  0.538410  1.171976   
1        0.528760    0.519080  0.538410 -0.778588   
2       -1.055764    0.681624  0.288887 -0.894062   
3       -1.055764   -0.781278  0.288887 -1.324746   
4       -1.055764   -0.510370  0.538410  1.171976   

               ...               inq_last_6mths  open_acc   pub_rec  \
0              ...                    -0.800122  1.851360 -0.329101   
1              ...                    -0.800122  1.442024 -0.329101   
2              ...                     3.886942 -0.809321 -0.329101   
3              ...                     0.137291  3.693369 -0.329101   
4              ...                     0.137291  0.828021 -0.329101   

   revol_bal  revol_util  total_acc  acc_now_delinq  loan_amnt       dti  \
0   0.556035    0.407422   1.183897       -0.051363   0.792242  1.453198   
1   1.029208    0.592675   1.353932       -0.051363   0.792242  0.796525   
2  -0.174204    0.834310  -0.091366       -0.051363  -0.462926  0.648741   
3  -0.510631   -0.095984   1.013862       -0.051363  -0.438315 -0.115878   
4  -0.525372   -0.595362   0.588774       -0.051363  -0.595211  0.043471   

   days_since_first_credit_line  
0                     -0.058769  
1                     -0.262228  
2                     -0.071656  
3                     -1.152993  
4                     -0.403985  

[5 rows x 22 columns]



In [263]:

    
clf = LogisticRegression(penalty='l1')
scores = cross_val_score(clf, new_X, y_train, cv=10, scoring='f1_weighted')
print(scores)  
print(np.mean(scores))









    



[ 0.74394672  0.74516986  0.74421423  0.74415496  0.74578639  0.74602528
  0.74268573  0.74441964  0.74488518  0.74455516]
0.744584314293



In [264]:

    
prediction = cross_val_predict(clf, new_X, y_train, cv=10)
confusion_matrix = ConfusionMatrix(y_train, prediction)
confusion_matrix.print_stats()
confusion_matrix.plot()









    



Confusion Matrix:

Predicted    Charged Off  Fully Paid  __all__
Actual                                       
Charged Off           93       14081    14174
Fully Paid           429       61353    61782
__all__              522       75434    75956


Overall Statistics:

Accuracy: 0.808968350097
95% CI: (0.80615432037334445, 0.81175876027606175)
No Information Rate: ToDo
P-Value [Acc > NIR]: 1.0
Kappa: -0.000608142860769
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                Charged Off   Fully Paid
Population                                   75956        75956
P: Condition positive                        14174        61782
N: Condition negative                        61782        14174
Test outcome positive                          522        75434
Test outcome negative                        75434          522
TP: True Positive                               93        61353
TN: True Negative                            61353           93
FP: False Positive                             429        14081
FN: False Negative                           14081          429
TPR: (Sensitivity, hit rate, recall)    0.00656131     0.993056
TNR=SPC: (Specificity)                    0.993056   0.00656131
PPV: Pos Pred Value (Precision)           0.178161     0.813334
NPV: Neg Pred Value                       0.813334     0.178161
FPR: False-out                          0.00694377     0.993439
FDR: False Discovery Rate                 0.821839     0.186666
FNR: Miss Rate                            0.993439   0.00694377
ACC: Accuracy                             0.808968     0.808968
F1 score                                 0.0126565     0.894254
MCC: Matthews correlation coefficient  -0.00180362  -0.00180362
Informedness                          -0.000382461 -0.000382461
Markedness                             -0.00850557  -0.00850557
Prevalence                                0.186608     0.813392
LR+: Positive likelihood ratio             0.94492     0.999615
LR-: Negative likelihood ratio             1.00039      1.05829
DOR: Diagnostic odds ratio                0.944557     0.944557
FOR: False omission rate                  0.186666     0.821839






    Out[264]:





<matplotlib.axes._subplots.AxesSubplot at 0x129fb3ac8>



In [269]:

    
y_score = cross_val_predict(clf, new_X, y_train, cv=10, method='predict_proba')
fpr, tpr, thresholds = roc_curve(y_train, y_score[:,0], pos_label='Charged Off')
print(auc(fpr, tpr))
plt.plot(fpr, tpr)









    



0.679367278898






    Out[269]:





[<matplotlib.lines.Line2D at 0x1444df208>]

To see the statistical relevance of certain features, we can use the statsmodels package. We first use it with the 5 features selected by SelectKBest. We see there that only term, int_rate and installement are relevant. Confidence interval of all is small, but the coefficients of alle are also very close to 0, so do not seem to have a huge influence.

Subsequently we do the same for the 5 features with the highest coefficients in the regularized logistic regression that uses all features. Of these all features seem useful, except for 'sub_grade'. The coefficients are slightly higher and confidence intervals are small. Although the conclusions are contradictory. Funded_amnt and funded_amnt_inv have the highest coefficients. These two values should be roughly the same, but have a contradictory relation with the target value charged_off. This makes no sense and gives the idea that the algorithm is still pretty random.



In [274]:

    
y_train == 'Charged Off'









    Out[274]:





455306    False
790918    False
354831     True
76630     False
91839      True
130302    False
141508    False
225891    False
125360    False
424340     True
189953    False
425536    False
397052    False
150002    False
456947    False
201937     True
20655     False
82353     False
15805     False
179553     True
203884    False
54652     False
14433      True
186397    False
215405     True
393572    False
155273     True
11953      True
31773     False
24921     False
          ...  
135750    False
200590    False
217457    False
539140    False
394080    False
37762     False
220456    False
696143    False
384529     True
883627    False
264591    False
113698    False
188324    False
870075     True
226221    False
351325    False
102705     True
225235    False
450122     True
61789     False
199408    False
406847    False
217931    False
376823    False
189476    False
370936    False
19669     False
30115     False
17428     False
866544    False
Name: Actual, dtype: bool



In [275]:

    
import statsmodels.api as sm

print(new_X.columns)
logit = sm.Logit(y_train == 'Charged Off', np.array(new_X))
result = logit.fit()
print(result.summary())









    



Index(['term', 'int_rate', 'installment', 'grade', 'sub_grade'], dtype='object')
Optimization terminated successfully.
         Current function value: 0.675398
         Iterations 4
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                 Actual   No. Observations:               176939
Model:                          Logit   Df Residuals:                   176934
Method:                           MLE   Df Model:                            4
Date:                Thu, 26 Jan 2017   Pseudo R-squ.:                 -0.4376
Time:                        18:22:55   Log-Likelihood:            -1.1950e+05
converged:                       True   LL-Null:                       -83125.
                                        LLR p-value:                     1.000
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             0.0903      0.006     16.251      0.000         0.079     0.101
x2             0.3858      0.020     18.999      0.000         0.346     0.426
x3            -0.0162      0.005     -3.255      0.001        -0.026    -0.006
x4             0.0197      0.050      0.391      0.696        -0.079     0.119
x5            -0.0658      0.061     -1.078      0.281        -0.185     0.054
==============================================================================



In [277]:

    
logit = sm.Logit(y_train == 'Charged Off', np.array(
        X_train_scaled.loc[:,['int_rate', 'annual_inc', 'sub_grade', 'term', 'dti']]))
result = logit.fit()
print(result.summary())









    



Optimization terminated successfully.
         Current function value: 0.672390
         Iterations 4
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                 Actual   No. Observations:               176939
Model:                          Logit   Df Residuals:                   176934
Method:                           MLE   Df Model:                            4
Date:                Thu, 26 Jan 2017   Pseudo R-squ.:                 -0.4312
Time:                        18:24:22   Log-Likelihood:            -1.1897e+05
converged:                       True   LL-Null:                       -83125.
                                        LLR p-value:                     1.000
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             0.3304      0.019     17.575      0.000         0.294     0.367
x2            -0.1166      0.005    -23.045      0.000        -0.127    -0.107
x3            -0.0177      0.019     -0.930      0.353        -0.055     0.020
x4             0.1061      0.006     19.017      0.000         0.095     0.117
x5             0.0902      0.005     17.881      0.000         0.080     0.100
==============================================================================

Another way to possibly increase performance is to tune the C (penalization) parameter. We will do this with the GridSearchCV function of sklearn. The best performing C parameter, although really close with the default, is C=1. Giving an accuracy of 0.752. (code is quoted out because it takes a long time to run)



In [278]:

    
from sklearn.model_selection import GridSearchCV
dict_Cs = {'C': [0.001, 0.1, 1, 10, 100]}
clf = GridSearchCV(LogisticRegression(penalty='l1'), dict_Cs, 'f1_weighted', cv=10)

clf.fit(X_train_scaled, y_train)
print(clf.best_params_)
print(clf.best_score_)









    



{'C': 10}
0.753161919566



In [280]:

    
clf = LogisticRegression(penalty='l1', C=10)
scores = cross_val_score(clf, X_train_scaled, y_train, cv=10, scoring='f1_weighted')
print(scores)  
print(np.mean(scores))









    



[ 0.75322471  0.75227635  0.75326607  0.75338166  0.75425814  0.75210257
  0.75256493  0.7538835   0.75243284  0.75422849]
0.753161926239



In [282]:

    
prediction = cross_val_predict(clf, X_train_scaled, y_train, cv=10)
confusion_matrix = ConfusionMatrix(y_train, prediction)
confusion_matrix.print_stats()
confusion_matrix.plot()









    



Confusion Matrix:

Predicted    Charged Off  Fully Paid  __all__
Actual                                       
Charged Off          196       13978    14174
Fully Paid           868       60914    61782
__all__             1064       74892    75956


Overall Statistics:

Accuracy: 0.804544736426
95% CI: (0.80170643567460653, 0.80735975642941682)
No Information Rate: ToDo
P-Value [Acc > NIR]: 1.0
Kappa: -0.000343773069975
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                Charged Off   Fully Paid
Population                                   75956        75956
P: Condition positive                        14174        61782
N: Condition negative                        61782        14174
Test outcome positive                         1064        74892
Test outcome negative                        74892         1064
TP: True Positive                              196        60914
TN: True Negative                            60914          196
FP: False Positive                             868        13978
FN: False Negative                           13978          868
TPR: (Sensitivity, hit rate, recall)     0.0138281     0.985951
TNR=SPC: (Specificity)                    0.985951    0.0138281
PPV: Pos Pred Value (Precision)           0.184211     0.813358
NPV: Neg Pred Value                       0.813358     0.184211
FPR: False-out                           0.0140494     0.986172
FDR: False Discovery Rate                 0.815789     0.186642
FNR: Miss Rate                            0.986172    0.0140494
ACC: Accuracy                             0.804545     0.804545
F1 score                                 0.0257252     0.891377
MCC: Matthews correlation coefficient -0.000733497 -0.000733497
Informedness                          -0.000221263 -0.000221263
Markedness                             -0.00243157  -0.00243157
Prevalence                                0.186608     0.813392
LR+: Positive likelihood ratio            0.984251     0.999776
LR-: Negative likelihood ratio             1.00022        1.016
DOR: Diagnostic odds ratio                 0.98403      0.98403
FOR: False omission rate                  0.186642     0.815789






    Out[282]:





<matplotlib.axes._subplots.AxesSubplot at 0x1377aea20>



In [283]:

    
y_score = cross_val_predict(clf, X_train_scaled, y_train, cv=10, method='predict_proba')
fpr, tpr, thresholds = roc_curve(y_train, y_score[:,0], pos_label='Charged Off')
print(auc(fpr, tpr))
plt.plot(fpr, tpr)









    



0.700731748611






    Out[283]:





[<matplotlib.lines.Line2D at 0x147ba8e80>]

Random Forest

To improve accuracy levels we could use a more complicated algorithm that scores well in a lot of cases, namely random forest. This algorithm makes various decision trees from subsets of the samples and uses at each split only a fraction of the features to prevent overfitting. The random forest algorithm is known to be not very sensitive to the values of its parameters: the number of features used at each split and the number of trees in the forest. Nevertheless, the default of sklearn is so low that we will raise the number of trees to 100. The algorithm has feature selection already builtin (at each split) and scaling/normalization is also not necessary.

We will first run the algorithm with only grade. This makes not that much sense for Random Forest, since it builds trees. And you cannot build a tree from only one feature. Nevertheless, this will be our starting point. The F1 score is 0.739 hence slightly lower than logistic regression. As expected is the confusion matrix dramatic, namely the algorithm turns out to just predict everything as fully paid. And that's why the AUC-score is exactly random.



In [284]:

    
clf = RandomForestClassifier(n_estimators=100)
scores = cross_val_score(clf, X_train.loc[:,['grade']], y_train, cv=10, scoring='f1_weighted')
print(scores)
print(np.mean(scores))









    



/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)






    



[ 0.74032961  0.74032961  0.74032961  0.74039443  0.74039443  0.74039443
  0.7403803   0.7403803   0.7403803   0.7403803 ]
0.740369331874



In [285]:

    
prediction = cross_val_predict(clf, X_train.loc[:,['grade']], y_train, cv=10)
confusion_matrix = ConfusionMatrix(y_train, prediction)
confusion_matrix.print_stats()
confusion_matrix.plot()









    



Confusion Matrix:

Predicted    Charged Off  Fully Paid  __all__
Actual                                       
Charged Off            0       14174    14174
Fully Paid             0       61782    61782
__all__                0       75956    75956


Overall Statistics:

Accuracy: 0.813391963768
95% CI: (0.81060277776800005, 0.81615719138182941)
No Information Rate: ToDo
P-Value [Acc > NIR]: 1.0
Kappa: 0.0
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                               Charged Off Fully Paid
Population                                  75956      75956
P: Condition positive                       14174      61782
N: Condition negative                       61782      14174
Test outcome positive                           0      75956
Test outcome negative                       75956          0
TP: True Positive                               0      61782
TN: True Negative                           61782          0
FP: False Positive                              0      14174
FN: False Negative                          14174          0
TPR: (Sensitivity, hit rate, recall)            0          1
TNR=SPC: (Specificity)                          1          0
PPV: Pos Pred Value (Precision)               NaN   0.813392
NPV: Neg Pred Value                      0.813392        NaN
FPR: False-out                                  0          1
FDR: False Discovery Rate                     NaN   0.186608
FNR: Miss Rate                                  1          0
ACC: Accuracy                            0.813392   0.813392
F1 score                                        0   0.897094
MCC: Matthews correlation coefficient         NaN        NaN
Informedness                                    0          0
Markedness                                    NaN        NaN
Prevalence                               0.186608   0.813392
LR+: Positive likelihood ratio                NaN          1
LR-: Negative likelihood ratio                  1        NaN
DOR: Diagnostic odds ratio                    NaN        NaN
FOR: False omission rate                 0.186608        NaN






    Out[285]:





<matplotlib.axes._subplots.AxesSubplot at 0x1444f5ef0>



In [287]:

    
y_score = cross_val_predict(clf, X_train.loc[:,['grade']], y_train, cv=10, method='predict_proba')
fpr, tpr, thresholds = roc_curve(y_train, y_score[:,0], pos_label='Charged Off')
print(auc(fpr, tpr))
plt.plot(fpr, tpr)









    



0.662009408724






    Out[287]:





[<matplotlib.lines.Line2D at 0x18e826eb8>]

Trying the algorithm with all the features (24) leads to a slightly higher F1-score of 0.750. But logistic regression with all features was a fraction better than that. Also the confusion matrix and AUC is comparable but slightly worse than the logistic regression algorithm with all features. The random forest classifier does select a different top-5 features, namely 'dti', 'revol_bal', 'revol_util', 'annual_inc' and 'int_rate'.



In [288]:

    
clf = RandomForestClassifier(n_estimators=100)
scores = cross_val_score(clf, X_train, y_train, cv=10, scoring='f1_weighted')
print(scores)
print(np.mean(scores))









    



[ 0.75355795  0.75200913  0.75151802  0.75171457  0.75187135  0.75392217
  0.75215301  0.75356562  0.75276674  0.75134994]
0.752442849609



In [289]:

    
prediction = cross_val_predict(clf, X_train, y_train, cv=10)
confusion_matrix = ConfusionMatrix(y_train, prediction)
confusion_matrix.print_stats()
confusion_matrix.plot()









    



Confusion Matrix:

Predicted    Charged Off  Fully Paid  __all__
Actual                                       
Charged Off          224       13950    14174
Fully Paid           892       60890    61782
__all__             1116       74840    75956


Overall Statistics:

Accuracy: 0.804597398494
95% CI: (0.80175938337294439, 0.80741212883302516)
No Information Rate: ToDo
P-Value [Acc > NIR]: 1.0
Kappa: 0.00211724762002
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                               Charged Off  Fully Paid
Population                                  75956       75956
P: Condition positive                       14174       61782
N: Condition negative                       61782       14174
Test outcome positive                        1116       74840
Test outcome negative                       74840        1116
TP: True Positive                             224       60890
TN: True Negative                           60890         224
FP: False Positive                            892       13950
FN: False Negative                          13950         892
TPR: (Sensitivity, hit rate, recall)    0.0158036    0.985562
TNR=SPC: (Specificity)                   0.985562   0.0158036
PPV: Pos Pred Value (Precision)          0.200717    0.813602
NPV: Neg Pred Value                      0.813602    0.200717
FPR: False-out                          0.0144379    0.984196
FDR: False Discovery Rate                0.799283    0.186398
FNR: Miss Rate                           0.984196   0.0144379
ACC: Accuracy                            0.804597    0.804597
F1 score                                0.0293002    0.891364
MCC: Matthews correlation coefficient  0.00442222  0.00442222
Informedness                           0.00136572  0.00136572
Markedness                              0.0143192   0.0143192
Prevalence                               0.186608    0.813392
LR+: Positive likelihood ratio            1.09459     1.00139
LR-: Negative likelihood ratio           0.998614    0.913582
DOR: Diagnostic odds ratio                1.09611     1.09611
FOR: False omission rate                 0.186398    0.799283






    Out[289]:





<matplotlib.axes._subplots.AxesSubplot at 0x1565212b0>



In [290]:

    
y_score = cross_val_predict(clf, X_train, y_train, cv=10, method='predict_proba')
fpr, tpr, thresholds = roc_curve(y_train, y_score[:,0], pos_label='Charged Off')
print(auc(fpr, tpr))
plt.plot(fpr, tpr)









    



0.693645782868






    Out[290]:





[<matplotlib.lines.Line2D at 0x1887b6c88>]



In [291]:

    
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
feat_imp = clf.feature_importances_



In [292]:

    
sns.barplot(x=X_train.columns, y=feat_imp, color='turquoise')
plt.xticks(rotation=90)









    Out[292]:





(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21]), <a list of 22 Text xticklabel objects>)



In [293]:

    
positions = abs(feat_imp).argsort()[-5:][::-1]
print(X_train.columns[positions])
print(feat_imp[positions])









    



Index(['dti', 'revol_bal', 'days_since_first_credit_line', 'revol_util',
       'int_rate'],
      dtype='object')
[ 0.08211771  0.07660079  0.07606937  0.07533684  0.07394158]

Test set

To test the accuracies of our algorithms we first have to do the same transformations on the test set as we did on the training set. So we will transform the categorical features to numerical and replace nan/inf/-inf with 0. Also for the logistic regression algorithm we normalized and scaled the training set and we saved these transformations, so we can do the exact same tranformation on the test set.



In [294]:

    
# term
X_test['term'] = X_test['term'].apply(lambda x: int(x.split(' ')[1]))

# grade
X_test['grade'] = X_test['grade'].apply(lambda x: grade_dict[x])

# emp_length
X_test['emp_length'] = X_test['emp_length'].apply(lambda x: emp_length_dict[x])

# zipcode
X_test['zip_code'] = X_test['zip_code'].apply(lambda x: int(x[0:3]))

# subgrade
X_test['sub_grade'] = X_test['grade'] + X_test['sub_grade'].apply(lambda x: float(list(x)[1])/10)

# house
X_test['home_ownership'] = X_test['home_ownership'].apply(lambda x: house_dict[x])

# purpose
X_test['purpose'] = X_test['purpose'].apply(lambda x: purpose_dict[x])

# states
X_test['addr_state'] = X_test['addr_state'].apply(lambda x: state_dict[x])

# make NA's, inf and -inf 0
X_test = X_test.fillna(0)
X_test = X_test.replace([np.inf, -np.inf], 0)



In [295]:

    
X_test_scaled = scaler.transform(X_test)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=X_test.columns)

logistic regression

For logistic regression we will test both the 'only grade' algorithm (baseline) and the best performing algorithm (C=1, all features with regularization). We find practically the same F-scores/confusion matrices/ROC-curves/AUC-scores as for the training set. Therefore the crossvalidation scheme used on the training set gives reliable accuracy measurements. But it's clear that the predictive value of the algorithm increases slightly with more features, but it's basically predicting that all loans get fully paid and therefore the accuracy scores are practically random.



In [296]:

    
from sklearn.metrics import f1_score

clf = LogisticRegression(penalty='l1', C=10)
clf.fit(X_train_scaled.loc[:,['grade']], y_train)
prediction = clf.predict(X_test_scaled.loc[:,['grade']])
print(f1_score(y_test, prediction, average='weighted'))
confusion_matrix = ConfusionMatrix(y_test, prediction)
confusion_matrix.print_stats()
confusion_matrix.plot()









    



0.746041404442
Confusion Matrix:

Predicted    Charged Off  Fully Paid  __all__
Actual                                       
Charged Off           19        2529     2548
Fully Paid           101       13201    13302
__all__              120       15730    15850


Overall Statistics:

Accuracy: 0.834069400631
95% CI: (0.82818508469201324, 0.83983084214263837)
No Information Rate: ToDo
P-Value [Acc > NIR]: 1.0
Kappa: -0.000221228958055
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                Charged Off   Fully Paid
Population                                   15850        15850
P: Condition positive                         2548        13302
N: Condition negative                        13302         2548
Test outcome positive                          120        15730
Test outcome negative                        15730          120
TP: True Positive                               19        13201
TN: True Negative                            13201           19
FP: False Positive                             101         2529
FN: False Negative                            2529          101
TPR: (Sensitivity, hit rate, recall)    0.00745683     0.992407
TNR=SPC: (Specificity)                    0.992407   0.00745683
PPV: Pos Pred Value (Precision)           0.158333     0.839224
NPV: Neg Pred Value                       0.839224     0.158333
FPR: False-out                          0.00759284     0.992543
FDR: False Discovery Rate                 0.841667     0.160776
FNR: Miss Rate                            0.992543   0.00759284
ACC: Accuracy                             0.834069     0.834069
F1 score                                 0.0142429      0.90941
MCC: Matthews correlation coefficient -0.000576352 -0.000576352
Informedness                          -0.000136014 -0.000136014
Markedness                             -0.00244225  -0.00244225
Prevalence                                0.160757     0.839243
LR+: Positive likelihood ratio            0.982087     0.999863
LR-: Negative likelihood ratio             1.00014      1.01824
DOR: Diagnostic odds ratio                0.981952     0.981952
FOR: False omission rate                  0.160776     0.841667






    Out[296]:





<matplotlib.axes._subplots.AxesSubplot at 0x1397464a8>



In [297]:

    
y_score = clf.predict_proba(X_test_scaled.loc[:,['grade']])
print(clf.classes_)
fpr, tpr, thresholds = roc_curve(y_test, y_score[:,0], pos_label='Charged Off')
print(auc(fpr, tpr))
plt.plot(fpr, tpr)









    



['Charged Off' 'Fully Paid']
0.662318128932






    Out[297]:





[<matplotlib.lines.Line2D at 0x1345e7da0>]



In [298]:

    
clf = LogisticRegression(penalty='l1', C=10)
clf.fit(X_train_scaled, y_train)
prediction = clf.predict(X_test_scaled)
print(f1_score(y_test, prediction, average='weighted'))
confusion_matrix = ConfusionMatrix(y_test, prediction)
confusion_matrix.print_stats()
confusion_matrix.plot()









    



0.753664722402
Confusion Matrix:

Predicted    Charged Off  Fully Paid  __all__
Actual                                       
Charged Off           37        2511     2548
Fully Paid           202       13100    13302
__all__              239       15611    15850


Overall Statistics:

Accuracy: 0.828832807571
95% CI: (0.82287728997885279, 0.83466738743382574)
No Information Rate: ToDo
P-Value [Acc > NIR]: 1.0
Kappa: -0.00104860773136
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                Charged Off   Fully Paid
Population                                   15850        15850
P: Condition positive                         2548        13302
N: Condition negative                        13302         2548
Test outcome positive                          239        15611
Test outcome negative                        15611          239
TP: True Positive                               37        13100
TN: True Negative                            13100           37
FP: False Positive                             202         2511
FN: False Negative                            2511          202
TPR: (Sensitivity, hit rate, recall)     0.0145212     0.984814
TNR=SPC: (Specificity)                    0.984814    0.0145212
PPV: Pos Pred Value (Precision)           0.154812     0.839152
NPV: Neg Pred Value                       0.839152     0.154812
FPR: False-out                           0.0151857     0.985479
FDR: False Discovery Rate                 0.845188     0.160848
FNR: Miss Rate                            0.985479    0.0151857
ACC: Accuracy                             0.828833     0.828833
F1 score                                 0.0265518     0.906167
MCC: Matthews correlation coefficient  -0.00200279  -0.00200279
Informedness                          -0.000664493 -0.000664493
Markedness                              -0.0060364   -0.0060364
Prevalence                                0.160757     0.839243
LR+: Positive likelihood ratio            0.956242     0.999326
LR-: Negative likelihood ratio             1.00067      1.04576
DOR: Diagnostic odds ratio                0.955597     0.955597
FOR: False omission rate                  0.160848     0.845188






    Out[298]:





<matplotlib.axes._subplots.AxesSubplot at 0x1345e94e0>



In [299]:

    
y_score = clf.predict_proba(X_test_scaled)
print(clf.classes_)
fpr, tpr, thresholds = roc_curve(y_test, y_score[:,0], pos_label='Charged Off')
print(auc(fpr, tpr))
plt.plot(fpr, tpr)









    



['Charged Off' 'Fully Paid']
0.705200306527






    Out[299]:





[<matplotlib.lines.Line2D at 0x17a2dc9e8>]



In [ ]:



In [ ]:



In [ ]:

Try to see if top 25% and bottom 25% are ok (or 10%). Can we at least avoid bad loans?



In [344]:

    
closed_loans2 = closed_loans.drop(['loan_status'], axis=1)

# term
closed_loans2['term'] = closed_loans2['term'].apply(lambda x: int(x.split(' ')[1]))

# grade
closed_loans2['grade'] = closed_loans2['grade'].apply(lambda x: grade_dict[x])

# emp_length
closed_loans2['emp_length'] = closed_loans2['emp_length'].apply(lambda x: emp_length_dict[x])

# zipcode
closed_loans2['zip_code'] = closed_loans2['zip_code'].apply(lambda x: int(x[0:3]))

# subgrade
closed_loans2['sub_grade'] = closed_loans2['grade'] + closed_loans2['sub_grade'].apply(lambda x: float(list(x)[1])/10)

# house
closed_loans2['home_ownership'] = closed_loans2['home_ownership'].apply(lambda x: house_dict[x])

# purpose
closed_loans2['purpose'] = closed_loans2['purpose'].apply(lambda x: purpose_dict[x])

# states
closed_loans2['addr_state'] = closed_loans2['addr_state'].apply(lambda x: state_dict[x])

# make NA's, inf and -inf 0
closed_loans2 = closed_loans2.fillna(0)
closed_loans2 = closed_loans2.replace([np.inf, -np.inf], 0)

closed_loans_scaled = scaler.transform(closed_loans2)
closed_loans_scaled = pd.DataFrame(closed_loans_scaled, columns=closed_loans2.columns)
closed_loans_scaled.index = closed_loans2.index



In [352]:

    
closed_loans_scaled









    Out[352]:






  
    
      
      term
      int_rate
      installment
      grade
      sub_grade
      emp_length
      home_ownership
      annual_inc
      purpose
      zip_code
      ...
      inq_last_6mths
      open_acc
      pub_rec
      revol_bal
      revol_util
      total_acc
      acc_now_delinq
      loan_amnt
      dti
      days_since_first_credit_line
    
    
      id
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1077501
      -0.534690
      -0.706059
      -1.042886
      -0.595481
      -0.676713
      1.180548
      -1.055764
      -1.268913
      0.288887
      1.047140
      ...
      0.137291
      -1.627992
      -0.329101
      -0.081789
      1.184680
      -1.366628
      -0.051363
      -1.053594
      1.426211
      1.663403
    
    
      1077430
      1.870241
      0.343152
      -1.463942
      0.152587
      0.231467
      -1.503256
      -1.055764
      -1.106368
      -0.709206
      -0.672477
      ...
      3.886942
      -1.627992
      -0.329101
      -0.732395
      -1.807563
      -1.791716
      -0.051363
      -1.361234
      -1.998511
      -0.368448
    
    
      1077175
      -0.534690
      0.499852
      -1.363827
      0.152587
      0.307149
      1.180548
      -1.055764
      -1.566911
      -2.455868
      0.254431
      ...
      1.074704
      -1.832660
      -0.329101
      -0.663369
      1.780712
      -1.281611
      -0.051363
      -1.373539
      -1.006434
      -0.737485
    
    
      1076863
      -0.534690
      -0.061090
      -0.321892
      0.152587
      0.004422
      1.180548
      -1.055764
      -0.564552
      -2.705391
      1.225031
      ...
      0.137291
      -0.195318
      -0.329101
      -0.519661
      -1.340403
      1.013862
      -0.051363
      -0.438315
      0.443129
      0.082597
    
    
      1075269
      -0.534690
      -1.330590
      -1.069079
      -1.343549
      -1.282167
      -0.698114
      -1.055764
      -0.943823
      -0.210160
      1.022173
      ...
      2.012117
      -0.399986
      -0.329101
      -0.391019
      -1.046414
      -1.111576
      -0.051363
      -1.053594
      -0.687736
      -1.165490
    
    
      1072053
      -0.534690
      1.108486
      -1.261260
      1.648722
      1.518056
      0.912168
      -1.055764
      -0.618734
      -0.709206
      1.171976
      ...
      1.074704
      -1.423325
      -0.329101
      -0.376985
      1.337715
      -1.791716
      -0.051363
      -1.299706
      -1.439504
      -1.474387
    
    
      1071795
      1.870241
      1.708035
      -1.085711
      2.396790
      2.350555
      -0.429734
      2.113283
      -0.835460
      -2.455868
      1.352988
      ...
      1.074704
      0.009350
      -0.329101
      -0.540765
      -0.873242
      -1.026558
      -0.051363
      -0.979761
      -1.413803
      -1.081919
    
    
      1071570
      1.870241
      -0.242771
      -1.212142
      -0.595481
      -0.449668
      -1.503256
      -1.055764
      -1.512730
      -2.705391
      0.778742
      ...
      -0.800122
      -1.832660
      -0.329101
      -0.319436
      -0.716180
      -1.876733
      -0.051363
      -1.007448
      0.196395
      -1.141668
    
    
      1070078
      1.870241
      0.202349
      -1.081379
      0.152587
      0.155785
      -0.161354
      2.113283
      0.031445
      0.538410
      1.025294
      ...
      1.074704
      0.623353
      -0.329101
      -0.604842
      -1.356511
      -0.176383
      -0.051363
      -0.869010
      -0.055479
      -0.190763
    
    
      1069908
      -0.534690
      -0.242771
      -0.063512
      -0.595481
      -0.449668
      1.180548
      2.113283
      0.112718
      0.538410
      1.212548
      ...
      -0.800122
      0.214017
      -0.329101
      0.445180
      0.516157
      0.758809
      -0.051363
      -0.192204
      -0.741709
      0.986250
    
    
      1064687
      -0.534690
      -0.061090
      -0.460541
      0.152587
      0.004422
      -1.503256
      -1.055764
      -1.106368
      0.538410
      -0.872215
      ...
      0.137291
      -1.423325
      -0.329101
      -0.255632
      1.506859
      -1.366628
      -0.051363
      -0.561371
      -0.831664
      -1.081919
    
    
      1069866
      -0.534690
      -0.874115
      -1.313361
      -0.595481
      -0.752395
      -0.698114
      -1.055764
      -1.512730
      0.288887
      0.254431
      ...
      1.074704
      0.009350
      -0.329101
      -0.425831
      -0.450381
      -1.196593
      -0.051363
      -1.299706
      -0.512966
      -0.974528
    
    
      1069057
      -0.534690
      -0.706059
      -0.377343
      -0.595481
      -0.676713
      -0.698114
      -1.055764
      0.789987
      -2.705391
      1.331142
      ...
      1.074704
      0.623353
      -0.329101
      -0.171594
      0.048997
      0.333722
      -0.051363
      -0.438315
      -1.219757
      0.760922
    
    
      1069759
      -0.534690
      0.574796
      -1.564139
      0.900654
      0.761239
      -1.503256
      -1.055764
      -1.160549
      0.538410
      0.363662
      ...
      0.137291
      0.009350
      -0.329101
      -0.469292
      1.096081
      -0.176383
      -0.051363
      -1.545817
      0.482967
      -1.569282
    
    
      1065775
      -0.534690
      0.343152
      -0.286463
      0.152587
      0.231467
      -0.429734
      -1.055764
      -0.781278
      -1.956821
      1.237515
      ...
      1.074704
      0.623353
      -0.329101
      0.483636
      0.641002
      0.248704
      -0.051363
      -0.438315
      0.263219
      -0.297374
    
    
      1069971
      -0.534690
      -1.755271
      -1.260688
      -1.343549
      -1.509212
      1.180548
      0.528760
      1.060895
      -0.459683
      -1.427736
      ...
      -0.800122
      1.851360
      -0.329101
      0.417983
      -1.541765
      1.438949
      -0.051363
      -1.225872
      -0.775121
      0.439528
    
    
      1062474
      -0.534690
      -0.465331
      -0.897453
      -0.595481
      -0.601032
      -1.234875
      0.528760
      0.356535
      -1.208252
      1.140767
      ...
      -0.800122
      -1.423325
      -0.329101
      -0.824158
      -0.666644
      -0.941541
      -0.051363
      -0.930538
      0.242658
      -1.010455
    
    
      1069742
      -0.534690
      -1.755271
      -0.564212
      -1.343549
      -1.509212
      0.107027
      -1.055764
      0.193990
      0.538410
      1.237515
      ...
      -0.800122
      -0.604654
      -0.329101
      -0.426321
      -1.255830
      0.248704
      -0.051363
      -0.536760
      -0.859936
      -0.618768
    
    
      1069740
      1.870241
      0.343152
      0.271935
      0.152587
      0.231467
      -0.698114
      -1.055764
      -0.727097
      0.538410
      0.766259
      ...
      2.012117
      -0.604654
      -0.329101
      0.144762
      1.261198
      -0.261401
      -0.051363
      0.823006
      1.282283
      -0.166942
    
    
      1039153
      -0.534690
      -0.304089
      1.159080
      -0.595481
      -0.525350
      1.180548
      -1.055764
      0.925441
      0.538410
      -0.591334
      ...
      -0.800122
      -0.809321
      -0.329101
      0.923792
      1.450478
      1.098879
      -0.051363
      0.915298
      -0.428151
      1.936764
    
    
      1069710
      -0.534690
      -0.465331
      -0.356830
      -0.595481
      -0.601032
      1.180548
      2.113283
      -0.564552
      0.288887
      0.856765
      ...
      -0.800122
      -0.604654
      -0.329101
      -0.277172
      1.132326
      -0.346418
      -0.051363
      -0.438315
      -0.690306
      1.592720
    
    
      1069700
      -0.534690
      -0.465331
      -0.356830
      -0.595481
      -0.601032
      -0.161354
      -1.055764
      -0.564552
      0.538410
      1.225031
      ...
      -0.800122
      -1.013989
      -0.329101
      0.144055
      1.510887
      -0.686488
      -0.051363
      -0.438315
      -0.069615
      -0.938991
    
    
      1069559
      -0.534690
      -0.465331
      -0.897453
      -0.595481
      -0.601032
      -1.234875
      -1.055764
      0.139808
      -0.459683
      1.171976
      ...
      0.137291
      -0.809321
      -0.329101
      -0.499807
      -0.990032
      -1.536663
      -0.051363
      -0.930538
      -1.818601
      -0.677736
    
    
      1069697
      -0.534690
      -0.874115
      0.266827
      -0.595481
      -0.752395
      -0.966495
      0.528760
      0.573261
      0.288887
      0.251310
      ...
      -0.800122
      -0.604654
      -0.329101
      -0.078580
      1.595459
      0.503757
      -0.051363
      0.176963
      1.656239
      -0.773412
    
    
      1069800
      -0.534690
      0.116050
      0.394566
      0.152587
      0.080104
      0.912168
      -1.055764
      -0.293644
      0.538410
      -1.315383
      ...
      0.137291
      -0.809321
      -0.329101
      -0.504757
      0.133569
      -1.196593
      -0.051363
      0.176963
      -0.171136
      -1.010455
    
    
      1069657
      1.870241
      0.683805
      -1.203152
      0.900654
      0.836921
      -0.966495
      -1.055764
      -0.537461
      -2.705391
      -1.168701
      ...
      -0.800122
      0.623353
      -0.329101
      -0.587816
      0.210087
      -0.261401
      -0.051363
      -1.053594
      -0.331770
      -1.010455
    
    
      1069799
      -0.534690
      -0.465331
      -1.167764
      -0.595481
      -0.601032
      1.180548
      0.528760
      0.952532
      0.538410
      -0.619422
      ...
      -0.800122
      0.214017
      -0.329101
      -0.491811
      -0.667853
      1.608984
      -0.051363
      -1.176650
      -1.403522
      1.723152
    
    
      1047704
      -0.534690
      -0.465331
      -0.559553
      -0.595481
      -0.601032
      -1.503256
      -1.055764
      -1.241822
      0.288887
      0.123353
      ...
      -0.800122
      -0.604654
      -0.329101
      -0.473806
      0.193978
      -1.111576
      -0.051363
      -0.622899
      -0.560514
      -1.450566
    
    
      1032111
      -0.534690
      -1.419160
      -1.152236
      -1.343549
      -1.357849
      0.375407
      0.528760
      -1.431457
      0.538410
      -1.196789
      ...
      -0.800122
      -1.013989
      -0.329101
      -0.214402
      1.313552
      -1.111576
      -0.051363
      -1.130504
      0.486822
      1.247895
    
    
      1069539
      -0.534690
      -1.330590
      2.360832
      -1.343549
      -1.282167
      -0.161354
      0.528760
      0.112718
      0.538410
      -1.387164
      ...
      -0.800122
      0.214017
      -0.329101
      0.435117
      -1.082659
      0.078669
      -0.051363
      2.247377
      -0.324060
      2.115229
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      37620521
      -0.534690
      -1.650804
      1.296953
      -1.343549
      -1.433531
      -1.503256
      0.528760
      0.112718
      0.288887
      0.766259
      ...
      0.137291
      -0.195318
      -0.329101
      -0.001884
      0.439640
      -0.686488
      -0.051363
      1.284465
      0.611474
      -0.273553
    
    
      37670482
      1.870241
      2.741350
      1.324740
      3.144857
      3.107372
      -1.234875
      -1.055764
      0.112718
      -1.707298
      -1.387164
      ...
      1.074704
      3.488702
      -0.329101
      3.711918
      -1.481356
      4.499580
      -0.051363
      1.392139
      2.276930
      1.996903
    
    
      37810537
      -0.534690
      2.323482
      2.539487
      2.396790
      2.350555
      0.643787
      0.528760
      2.686343
      -2.455868
      -1.399648
      ...
      1.074704
      -1.013989
      4.256724
      -0.708408
      -1.235694
      -1.366628
      -0.051363
      1.592105
      -1.370110
      -1.331849
    
    
      37197690
      1.870241
      0.506666
      0.675993
      0.900654
      0.836921
      1.180548
      0.528760
      0.329444
      -1.956821
      1.037777
      ...
      -0.800122
      1.032689
      1.963812
      -0.145321
      -0.696043
      1.608984
      -0.051363
      1.284465
      -0.730143
      1.925830
    
    
      37700380
      1.870241
      -0.310902
      -0.332802
      0.152587
      0.004422
      -0.698114
      0.528760
      -0.700006
      0.288887
      0.672632
      ...
      0.137291
      0.623353
      -0.329101
      -0.049098
      0.866528
      0.758809
      -0.051363
      0.176963
      1.913254
      0.618384
    
    
      37650375
      -0.534690
      0.279563
      1.124264
      0.152587
      0.307149
      1.180548
      2.113283
      0.248172
      -1.956821
      -0.940875
      ...
      0.137291
      -0.195318
      -0.329101
      2.231260
      0.584620
      -0.601471
      -0.051363
      0.792242
      0.912181
      0.831996
    
    
      37640409
      1.870241
      1.415074
      -0.496133
      1.648722
      1.669419
      -0.698114
      -1.055764
      0.248172
      0.288887
      1.390439
      ...
      1.074704
      0.418685
      -0.329101
      0.361522
      0.226196
      -0.091366
      -0.051363
      -0.290648
      -0.865076
      -1.509143
    
    
      37650403
      -0.534690
      0.125134
      1.097131
      0.152587
      0.231467
      -0.429734
      0.528760
      3.499066
      -1.956821
      -1.281053
      ...
      1.074704
      0.214017
      -0.329101
      -0.067157
      -1.384702
      -0.941541
      -0.051363
      0.792242
      -0.904913
      -0.938600
    
    
      37790415
      -0.534690
      0.279563
      2.540631
      0.152587
      0.307149
      1.180548
      0.528760
      0.356535
      0.538410
      1.234394
      ...
      0.137291
      0.828021
      -0.329101
      0.821042
      0.641002
      1.694002
      -0.051363
      2.022800
      -0.001506
      -0.534417
    
    
      36118265
      -0.534690
      -0.401743
      -0.079939
      -0.595481
      -0.449668
      -1.234875
      -1.055764
      -0.158190
      0.538410
      1.193822
      ...
      0.137291
      0.623353
      -0.329101
      -0.117744
      -0.389973
      -0.006348
      -0.051363
      -0.192204
      -1.035991
      0.867533
    
    
      36400285
      -0.534690
      0.767833
      0.163729
      0.900654
      0.988284
      0.107027
      0.528760
      -0.889641
      0.538410
      0.816193
      ...
      -0.800122
      -0.604654
      -0.329101
      -0.178066
      0.661138
      -0.091366
      -0.051363
      -0.090683
      -0.966597
      -0.950706
    
    
      37357154
      1.870241
      1.869278
      1.411534
      1.648722
      1.820783
      -1.503256
      -1.055764
      -0.293644
      0.288887
      -1.281053
      ...
      0.137291
      0.623353
      -0.329101
      0.191432
      0.278550
      -0.431436
      -0.051363
      1.733619
      -0.829094
      -1.390426
    
    
      37068125
      -0.534690
      0.279563
      -0.688641
      0.152587
      0.307149
      -0.161354
      0.528760
      0.085627
      0.288887
      0.741291
      ...
      1.074704
      -0.604654
      -0.329101
      -0.528690
      -0.349700
      -1.111576
      -0.051363
      -0.782871
      0.732271
      -0.819883
    
    
      37317965
      1.870241
      -0.310902
      0.584254
      0.152587
      0.004422
      -1.503256
      0.528760
      3.499066
      0.288887
      0.298123
      ...
      -0.800122
      2.465363
      -0.329101
      2.789832
      0.032888
      3.054282
      -0.051363
      1.407521
      -0.859936
      0.689458
    
    
      36801355
      -0.534690
      -0.174641
      1.044908
      0.152587
      0.080104
      1.180548
      0.528760
      0.166899
      0.288887
      1.287449
      ...
      1.074704
      -0.195318
      -0.329101
      -0.233004
      0.218141
      0.928844
      -0.051363
      0.792242
      -0.478269
      2.579162
    
    
      37297854
      -0.534690
      0.279563
      1.124264
      0.152587
      0.307149
      1.180548
      0.528760
      0.789987
      -2.455868
      -0.656873
      ...
      -0.800122
      1.237356
      4.256724
      0.056480
      -0.740343
      0.078669
      -0.051363
      0.792242
      0.923747
      0.273169
    
    
      37247819
      1.870241
      -0.022482
      1.091819
      0.152587
      0.155785
      1.180548
      2.113283
      -0.239463
      0.538410
      -0.747379
      ...
      0.137291
      -0.195318
      -0.329101
      0.569959
      0.089269
      -0.856523
      -0.051363
      1.982807
      -0.728858
      -0.403985
    
    
      37127712
      -0.534690
      0.620217
      -0.858551
      0.900654
      0.912602
      -0.161354
      2.113283
      -0.970914
      0.538410
      -0.407201
      ...
      0.137291
      -0.195318
      1.963812
      -0.672072
      -1.094741
      -0.261401
      -0.051363
      -0.945920
      0.305626
      -0.986634
    
    
      37167534
      1.870241
      2.550584
      2.488653
      2.396790
      2.501918
      0.643787
      0.528760
      0.735806
      0.538410
      -0.263639
      ...
      0.137291
      4.102705
      -0.329101
      0.622178
      1.237034
      3.309335
      -0.051363
      2.638079
      -0.127444
      0.570741
    
    
      37087435
      -0.534690
      0.931346
      0.060917
      0.900654
      1.063966
      0.912168
      -1.055764
      -0.781278
      0.538410
      -0.772346
      ...
      -0.800122
      -0.809321
      -0.329101
      -0.753772
      -0.812833
      -0.346418
      -0.051363
      -0.192204
      1.309269
      -0.273553
    
    
      37227443
      -0.534690
      0.931346
      -1.306618
      0.900654
      1.063966
      -1.503256
      0.528760
      -0.618734
      0.538410
      -0.557004
      ...
      0.137291
      -0.195318
      1.963812
      -0.717383
      -0.824915
      -0.601471
      -0.051363
      -1.333546
      -0.239245
      -0.309089
    
    
      37077186
      -0.534690
      0.506666
      0.302583
      0.900654
      0.836921
      -0.161354
      0.528760
      -0.049827
      -1.956821
      -0.613180
      ...
      -0.800122
      -0.399986
      -0.329101
      -0.402333
      0.395340
      -0.856523
      -0.051363
      0.053908
      -1.069403
      -0.582450
    
    
      37187152
      -0.534690
      -0.310902
      0.338911
      0.152587
      0.004422
      0.107027
      0.528760
      0.112718
      0.538410
      -0.688082
      ...
      -0.800122
      0.418685
      -0.329101
      0.905244
      1.096081
      -0.006348
      -0.051363
      0.176963
      0.873629
      0.094703
    
    
      36808246
      -0.534690
      -0.742396
      -0.911633
      -0.595481
      -0.601032
      -0.429734
      -1.055764
      -0.781278
      0.538410
      -0.975205
      ...
      0.137291
      1.442024
      -0.329101
      -0.583791
      -1.775345
      2.119090
      -0.051363
      -0.930538
      -0.757130
      1.914114
    
    
      37041208
      1.870241
      0.506666
      -0.019462
      0.900654
      0.836921
      1.180548
      0.528760
      -0.185281
      0.538410
      -1.106283
      ...
      -0.800122
      -0.604654
      -0.329101
      -0.222398
      -0.011412
      0.248704
      -0.051363
      0.423075
      1.946666
      0.011914
    
    
      36743377
      -0.534690
      0.506666
      -1.105121
      0.900654
      0.836921
      1.180548
      0.528760
      -0.618734
      -1.208252
      0.891095
      ...
      -0.800122
      0.418685
      -0.329101
      -0.120137
      0.367149
      1.694002
      -0.051363
      -1.152039
      2.618760
      1.307644
    
    
      36231718
      -0.534690
      -1.755271
      -0.368313
      -1.343549
      -1.509212
      -1.503256
      -1.055764
      -0.456189
      0.538410
      -0.622543
      ...
      -0.800122
      -0.399986
      -0.329101
      -0.238009
      -1.147095
      -0.346418
      -0.051363
      -0.342947
      -0.428151
      3.411350
    
    
      36241316
      -0.534690
      0.620217
      -0.807921
      0.900654
      0.912602
      -0.966495
      -1.055764
      -1.187640
      0.538410
      -0.606939
      ...
      0.137291
      -1.627992
      -0.329101
      -0.728642
      1.744467
      -1.791716
      -0.051363
      -0.902851
      0.260649
      -1.616925
    
    
      36421485
      -0.534690
      -1.155721
      -1.191138
      -0.595481
      -0.752395
      1.180548
      0.528760
      -0.564552
      -0.709206
      1.346746
      ...
      -0.800122
      0.009350
      1.963812
      -0.731688
      -1.960598
      0.418739
      -0.051363
      -1.176650
      -0.503970
      -0.416091
    
    
      36260758
      -0.534690
      1.244747
      -0.077815
      1.648722
      1.593738
      -1.503256
      2.113283
      -1.052186
      0.538410
      -0.294848
      ...
      0.137291
      -0.399986
      -0.329101
      -0.444107
      -0.510790
      -0.431436
      -0.051363
      -0.333718
      1.656239
      -0.380163
    
  

252771 rows × 22 columns



In [396]:

    
loans['roi'] = ((loans['total_rec_int'] + loans['total_rec_prncp'] 
                          + loans['total_rec_late_fee'] + loans['recoveries']) / loans['funded_amnt']) -1



In [407]:

    
prof_loans = loans[loans['id'].isin(closed_loans['loan_status'][y_score[:,1] > 0.9].index.tolist())]



In [399]:

    
roi = loans.groupby('grade')['roi'].mean()



In [401]:

    
prof_loans = loans[loans['id'].isin(closed_loans.index.tolist())]



In [413]:

    
roi = prof_loans.groupby('grade')['roi'].mean()
print(roi)
print(prof_loans['roi'].mean())









    



grade
A    0.046914
B    0.058231
C    0.048528
D    0.062202
E    0.069116
F   -0.057438
G    0.444977
Name: roi, dtype: float64
0.0512539962515



In [424]:

    
prof_loans['grade'] = prof_loans['grade'].astype('category', ordered=True)
sns.barplot(data=roi.reset_index(), x='grade', y='roi', color='gray')
plt.show()
roi = prof_loans.groupby('loan_status')['roi'].mean()
sns.barplot(data=roi.reset_index(), x='loan_status', y='roi')
plt.show()
roi = prof_loans.groupby(['grade', 'loan_status'])['roi'].mean()
sns.barplot(data=roi.reset_index(), x='roi', y='grade', hue='loan_status', orient='h')
plt.show()
sns.countplot(data=prof_loans, x='grade', hue='loan_status')
plt.show()









    



/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':



In [409]:

    
prof_loans









    Out[409]:






  
    
      
      id
      member_id
      loan_amnt
      funded_amnt
      funded_amnt_inv
      term
      int_rate
      installment
      grade
      sub_grade
      ...
      il_util
      open_rv_12m
      open_rv_24m
      max_bal_bc
      all_util
      total_rev_hi_lim
      inq_fi
      total_cu_tl
      inq_last_12m
      roi
    
  
  
    
      11
      1069908
      1305008
      12000.0
      12000.0
      12000.0
      36 months
      12.69
      402.54
      B
      B5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.161923
    
    
      14
      1069057
      1303503
      10000.0
      10000.0
      10000.0
      36 months
      10.65
      325.74
      B
      B2
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      -0.252801
    
    
      17
      1069971
      1304884
      3600.0
      3600.0
      3600.0
      36 months
      6.03
      109.57
      A
      A1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.051394
    
    
      19
      1069742
      1304855
      9200.0
      9200.0
      9200.0
      36 months
      6.03
      280.01
      A
      A1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.028257
    
    
      24
      1069559
      1304634
      6000.0
      6000.0
      6000.0
      36 months
      11.71
      198.46
      B
      B3
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      -0.658310
    
    
      28
      1069799
      1304678
      4000.0
      4000.0
      4000.0
      36 months
      11.71
      132.31
      B
      B3
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.121198
    
    
      31
      1069539
      1304608
      31825.0
      31825.0
      31825.0
      36 months
      7.90
      995.82
      A
      A4
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.096185
    
    
      33
      1069591
      1304289
      5000.0
      5000.0
      5000.0
      36 months
      8.90
      158.77
      A
      A5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.142918
    
    
      36
      1069361
      1304255
      10800.0
      10800.0
      10800.0
      36 months
      9.91
      348.03
      B
      B1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.125667
    
    
      37
      1069357
      1304251
      15000.0
      15000.0
      15000.0
      36 months
      7.90
      469.36
      A
      A4
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.110840
    
    
      40
      1067573
      1301955
      9600.0
      9600.0
      9600.0
      36 months
      7.51
      298.67
      A
      A3
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.119767
    
    
      41
      1069506
      1304567
      12000.0
      12000.0
      12000.0
      36 months
      7.90
      375.49
      A
      A4
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.126365
    
    
      44
      1069469
      1304526
      6000.0
      6000.0
      6000.0
      36 months
      6.03
      182.62
      A
      A1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.010977
    
    
      45
      1051117
      1282787
      14000.0
      14000.0
      14000.0
      36 months
      9.91
      451.15
      B
      B1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.159994
    
    
      46
      1069465
      1304521
      5000.0
      5000.0
      5000.0
      36 months
      8.90
      158.77
      A
      A5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.004274
    
    
      48
      1069287
      1304171
      10000.0
      10000.0
      10000.0
      36 months
      6.03
      304.36
      A
      A1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.028267
    
    
      49
      1069453
      1303701
      11000.0
      11000.0
      11000.0
      36 months
      6.62
      337.75
      A
      A2
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.105216
    
    
      50
      1069248
      1304123
      15000.0
      15000.0
      15000.0
      36 months
      9.91
      483.38
      B
      B1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.078518
    
    
      51
      1068120
      1302485
      25600.0
      25600.0
      25350.0
      36 months
      9.91
      824.96
      B
      B1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.159985
    
    
      65
      1069102
      1303750
      3500.0
      3500.0
      3500.0
      36 months
      10.65
      114.01
      B
      B2
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.172334
    
    
      74
      1068893
      1303514
      14400.0
      14400.0
      14400.0
      36 months
      8.90
      457.25
      A
      A5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.143022
    
    
      78
      1068997
      1303437
      15000.0
      15000.0
      15000.0
      36 months
      7.90
      469.36
      A
      A4
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.037176
    
    
      83
      1068967
      1303403
      4500.0
      4500.0
      4500.0
      36 months
      6.03
      136.96
      A
      A1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.095622
    
    
      98
      1068350
      1302971
      3500.0
      3500.0
      3500.0
      36 months
      6.03
      106.53
      A
      A1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.095617
    
    
      102
      1068508
      1302906
      6000.0
      6000.0
      6000.0
      36 months
      8.90
      190.52
      A
      A5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.142862
    
    
      103
      1066641
      1300833
      7200.0
      7200.0
      7200.0
      36 months
      9.91
      232.02
      B
      B1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.159839
    
    
      104
      1068315
      1302930
      9500.0
      9500.0
      9500.0
      36 months
      8.90
      301.66
      A
      A5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.104601
    
    
      110
      1068273
      1302680
      5500.0
      5500.0
      5500.0
      36 months
      6.62
      168.88
      A
      A2
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.105069
    
    
      111
      1068274
      1302681
      11000.0
      11000.0
      11000.0
      36 months
      6.62
      337.75
      A
      A2
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.105169
    
    
      116
      1061814
      1293438
      10000.0
      10000.0
      10000.0
      36 months
      8.90
      317.54
      A
      A5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0.086892
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      886730
      37691691
      40464722
      1500.0
      1500.0
      1500.0
      36 months
      8.67
      47.47
      B
      B1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      11200.0
      NaN
      NaN
      NaN
      0.027747
    
    
      886739
      37831603
      40594612
      8000.0
      8000.0
      8000.0
      36 months
      8.19
      251.40
      A
      A5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      21500.0
      NaN
      NaN
      NaN
      0.031838
    
    
      886740
      37107603
      39870372
      30000.0
      30000.0
      30000.0
      36 months
      6.99
      926.18
      A
      A3
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      39300.0
      NaN
      NaN
      NaN
      0.048415
    
    
      886812
      37821486
      40584462
      26000.0
      26000.0
      26000.0
      60 months
      8.19
      529.56
      A
      A5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      93500.0
      NaN
      NaN
      NaN
      0.040067
    
    
      886828
      36231341
      38942750
      6000.0
      6000.0
      6000.0
      36 months
      8.67
      189.88
      B
      B1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      17000.0
      NaN
      NaN
      NaN
      0.022013
    
    
      886888
      37700422
      40473182
      35000.0
      35000.0
      35000.0
      36 months
      12.39
      1169.04
      C
      C1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      100100.0
      NaN
      NaN
      NaN
      0.074413
    
    
      886890
      37651307
      40414240
      20000.0
      20000.0
      20000.0
      60 months
      10.49
      429.78
      B
      B3
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      89600.0
      NaN
      NaN
      NaN
      0.063475
    
    
      886891
      37701353
      40474288
      20950.0
      20950.0
      20950.0
      60 months
      17.14
      522.24
      D
      D4
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      15000.0
      NaN
      NaN
      NaN
      0.133916
    
    
      886928
      37721317
      40444274
      30000.0
      30000.0
      30000.0
      36 months
      6.49
      919.34
      A
      A2
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      105300.0
      NaN
      NaN
      NaN
      0.052816
    
    
      886937
      37611321
      40374297
      8000.0
      8000.0
      8000.0
      36 months
      9.49
      256.23
      B
      B2
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      5300.0
      NaN
      NaN
      NaN
      -0.823315
    
    
      886947
      37691299
      40464249
      16000.0
      16000.0
      16000.0
      36 months
      9.49
      512.46
      B
      B2
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      33800.0
      NaN
      NaN
      NaN
      0.020843
    
    
      886953
      37601328
      40364250
      17000.0
      17000.0
      17000.0
      36 months
      7.49
      528.73
      A
      A4
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      66500.0
      NaN
      NaN
      NaN
      0.034161
    
    
      886966
      37751241
      40514142
      13200.0
      13200.0
      13200.0
      36 months
      15.59
      461.41
      D
      D1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      39600.0
      NaN
      NaN
      NaN
      0.025246
    
    
      886980
      37741211
      40504131
      6000.0
      6000.0
      6000.0
      36 months
      8.19
      188.55
      A
      A5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      31600.0
      NaN
      NaN
      NaN
      0.048925
    
    
      887000
      37731131
      40494040
      18000.0
      18000.0
      18000.0
      36 months
      6.99
      555.71
      A
      A3
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      13500.0
      NaN
      NaN
      NaN
      -0.846413
    
    
      887009
      37701109
      40474018
      11000.0
      11000.0
      11000.0
      36 months
      11.44
      362.43
      B
      B4
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      22500.0
      NaN
      NaN
      NaN
      0.082765
    
    
      887012
      37711157
      40484084
      10000.0
      10000.0
      10000.0
      36 months
      15.59
      349.55
      D
      D1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      5700.0
      NaN
      NaN
      NaN
      -0.826957
    
    
      887032
      35948698
      38650263
      35000.0
      35000.0
      35000.0
      36 months
      11.99
      1162.34
      B
      B5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      181700.0
      NaN
      NaN
      NaN
      0.066983
    
    
      887093
      37630866
      40393702
      4000.0
      4000.0
      4000.0
      36 months
      6.99
      123.50
      A
      A3
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      76000.0
      NaN
      NaN
      NaN
      0.004657
    
    
      887105
      37720840
      40443672
      35000.0
      35000.0
      35000.0
      36 months
      13.66
      1190.45
      C
      C3
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      47500.0
      NaN
      NaN
      NaN
      -0.732788
    
    
      887121
      37840758
      40603578
      30000.0
      30000.0
      30000.0
      36 months
      8.67
      949.40
      B
      B1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      39800.0
      NaN
      NaN
      NaN
      0.051029
    
    
      887124
      36281499
      39002924
      5000.0
      5000.0
      5000.0
      36 months
      10.49
      162.49
      B
      B3
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      148400.0
      NaN
      NaN
      NaN
      0.055844
    
    
      887128
      37750780
      40513580
      1500.0
      1500.0
      1500.0
      36 months
      11.99
      49.82
      B
      B5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      15500.0
      NaN
      NaN
      NaN
      0.057553
    
    
      887144
      37760680
      40523466
      2000.0
      2000.0
      2000.0
      36 months
      8.19
      62.85
      A
      A5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      23900.0
      NaN
      NaN
      NaN
      0.031935
    
    
      887164
      37620521
      40383278
      24000.0
      24000.0
      24000.0
      36 months
      6.49
      735.47
      A
      A2
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      23200.0
      NaN
      NaN
      NaN
      0.029352
    
    
      887194
      37650403
      40413143
      20000.0
      20000.0
      20000.0
      36 months
      14.31
      686.57
      C
      C4
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      70000.0
      NaN
      NaN
      NaN
      0.071025
    
    
      887263
      37317965
      40080791
      25000.0
      25000.0
      25000.0
      60 months
      12.39
      561.06
      C
      C1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      120600.0
      NaN
      NaN
      NaN
      0.111017
    
    
      887346
      36808246
      39560970
      6000.0
      6000.0
      6000.0
      36 months
      10.49
      194.99
      B
      B3
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      43400.0
      NaN
      NaN
      NaN
      0.077563
    
    
      887364
      36231718
      38943165
      10775.0
      10775.0
      10775.0
      36 months
      6.03
      327.95
      A
      A1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      41700.0
      NaN
      NaN
      NaN
      0.027552
    
    
      887369
      36421485
      39142898
      4000.0
      4000.0
      4000.0
      36 months
      8.67
      126.59
      B
      B1
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      30100.0
      NaN
      NaN
      NaN
      0.039505
    
  

65296 rows × 75 columns



In [393]:

    
closed_loans.index.tolist()



In [388]:

    
y_score = clf.predict_proba(closed_loans_scaled)
prediction = clf.predict(closed_loans_scaled)
confusion_matrix = ConfusionMatrix(np.array(closed_loans['loan_status'][y_score[:,1] > 0.9]), prediction[y_score[:,1] > 0.9])
confusion_matrix.print_stats()
confusion_matrix.plot()









    



Confusion Matrix:

Predicted    Charged Off  Fully Paid  __all__
Actual                                       
Charged Off            0        4241     4241
Fully Paid             0       61055    61055
__all__                0       65296    65296


Overall Statistics:

Accuracy: 0.935049620191
95% CI: (0.93313232985352268, 0.93692813651530282)
No Information Rate: ToDo
P-Value [Acc > NIR]: 1.0
Kappa: 0.0
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                               Charged Off Fully Paid
Population                                  65296      65296
P: Condition positive                        4241      61055
N: Condition negative                       61055       4241
Test outcome positive                           0      65296
Test outcome negative                       65296          0
TP: True Positive                               0      61055
TN: True Negative                           61055          0
FP: False Positive                              0       4241
FN: False Negative                           4241          0
TPR: (Sensitivity, hit rate, recall)            0          1
TNR=SPC: (Specificity)                          1          0
PPV: Pos Pred Value (Precision)               NaN    0.93505
NPV: Neg Pred Value                       0.93505        NaN
FPR: False-out                                  0          1
FDR: False Discovery Rate                     NaN  0.0649504
FNR: Miss Rate                                  1          0
ACC: Accuracy                             0.93505    0.93505
F1 score                                        0   0.966435
MCC: Matthews correlation coefficient         NaN        NaN
Informedness                                    0          0
Markedness                                    NaN        NaN
Prevalence                              0.0649504    0.93505
LR+: Positive likelihood ratio                NaN          1
LR-: Negative likelihood ratio                  1        NaN
DOR: Diagnostic odds ratio                    NaN        NaN
FOR: False omission rate                0.0649504        NaN






    Out[388]:





<matplotlib.axes._subplots.AxesSubplot at 0x1180bc470>



In [386]:

    
np.array(closed_loans['loan_status'][y_score[:,1] > 0.9])









    Out[386]:





array(['Fully Paid', 'Charged Off', 'Fully Paid', ..., 'Fully Paid',
       'Fully Paid', 'Fully Paid'], dtype=object)



In [377]:

    
prediction[y_score[:,1] > 0.9]









    Out[377]:





array(['Fully Paid', 'Fully Paid', 'Fully Paid', ..., 'Fully Paid',
       'Fully Paid', 'Fully Paid'], dtype=object)



In [374]:

    
y_total[y_score[:,1] > 0.9]









    Out[374]:





225891     Fully Paid
424340    Charged Off
189953     Fully Paid
15805      Fully Paid
31773      Fully Paid
24921      Fully Paid
175902     Fully Paid
616528     Fully Paid
78182      Fully Paid
158031     Fully Paid
98202     Charged Off
16376      Fully Paid
176488     Fully Paid
19866      Fully Paid
206337     Fully Paid
203157     Fully Paid
72040      Fully Paid
344699     Fully Paid
29128      Fully Paid
796292     Fully Paid
23385      Fully Paid
90823     Charged Off
303026     Fully Paid
769678     Fully Paid
24073     Charged Off
170393     Fully Paid
189629     Fully Paid
80527      Fully Paid
185266    Charged Off
1919       Fully Paid
             ...     
186852     Fully Paid
25029      Fully Paid
191388     Fully Paid
138782     Fully Paid
285786     Fully Paid
711329     Fully Paid
414029     Fully Paid
69766      Fully Paid
284884     Fully Paid
304993    Charged Off
864821     Fully Paid
371647     Fully Paid
39221      Fully Paid
218513     Fully Paid
35084      Fully Paid
89635      Fully Paid
157741     Fully Paid
406600    Charged Off
276439     Fully Paid
419418     Fully Paid
20197      Fully Paid
101478     Fully Paid
287362     Fully Paid
18142      Fully Paid
63705      Fully Paid
363375     Fully Paid
217825     Fully Paid
752284     Fully Paid
885709     Fully Paid
277938     Fully Paid
Name: Actual, dtype: object



In [371]:

    
prediction[y_score[:,1] > 0.9]









    Out[371]:





array(['Fully Paid', 'Fully Paid', 'Fully Paid', ..., 'Fully Paid',
       'Fully Paid', 'Fully Paid'], dtype=object)



In [ ]:



In [ ]:



In [414]:

    
X_total = pd.concat([X_train_scaled, X_test_scaled])
y_total = pd.concat([y_train, y_test])



In [415]:

    
y_score = clf.predict_proba(X_total)
prediction = clf.predict(X_total)
confusion_matrix = ConfusionMatrix(y_total[y_score[:,1] > 0.9], prediction[y_score[:,1] > 0.9])
confusion_matrix.print_stats()
confusion_matrix.plot()









    



Confusion Matrix:

Predicted    Charged Off  Fully Paid  __all__
Actual                                       
Charged Off            0        1082     1082
Fully Paid             0       15023    15023
__all__                0       16105    16105


Overall Statistics:

Accuracy: 0.932815895685
95% CI: (0.92883974791545454, 0.93663484018407073)
No Information Rate: ToDo
P-Value [Acc > NIR]: 1.0
Kappa: 0.0
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                               Charged Off Fully Paid
Population                                  16105      16105
P: Condition positive                        1082      15023
N: Condition negative                       15023       1082
Test outcome positive                           0      16105
Test outcome negative                       16105          0
TP: True Positive                               0      15023
TN: True Negative                           15023          0
FP: False Positive                              0       1082
FN: False Negative                           1082          0
TPR: (Sensitivity, hit rate, recall)            0          1
TNR=SPC: (Specificity)                          1          0
PPV: Pos Pred Value (Precision)               NaN   0.932816
NPV: Neg Pred Value                      0.932816        NaN
FPR: False-out                                  0          1
FDR: False Discovery Rate                     NaN  0.0671841
FNR: Miss Rate                                  1          0
ACC: Accuracy                            0.932816   0.932816
F1 score                                        0    0.96524
MCC: Matthews correlation coefficient         NaN        NaN
Informedness                                    0          0
Markedness                                    NaN        NaN
Prevalence                              0.0671841   0.932816
LR+: Positive likelihood ratio                NaN          1
LR-: Negative likelihood ratio                  1        NaN
DOR: Diagnostic odds ratio                    NaN        NaN
FOR: False omission rate                0.0671841        NaN






    Out[415]:





<matplotlib.axes._subplots.AxesSubplot at 0x154c0d4a8>



In [416]:

    
diff_mean = X_total[y_score[:,1] > 0.9].mean() - X_total.mean()
abs(diff_mean).sort_values(ascending=False)









    Out[416]:





int_rate                        1.029157
sub_grade                       0.942607
grade                           0.925247
annual_inc                      0.608917
revol_util                      0.539280
dti                             0.531208
term                            0.446514
total_acc                       0.251735
home_ownership                  0.236309
inq_last_6mths                  0.227330
days_since_first_credit_line    0.183626
emp_length                      0.114662
loan_amnt                       0.101279
revol_bal                       0.099797
delinq_2yrs                     0.084395
installment                     0.080104
zip_code                        0.068169
pub_rec                         0.066996
addr_state                      0.026992
open_acc                        0.017322
acc_now_delinq                  0.016847
purpose                         0.012195
dtype: float64



In [421]:

    
for col in X_total.columns:
    result = ttest_ind(X_total[y_score[:,1] > 0.9][col], X_total[col])
    print(col, ':', result)
#X_total[y_score[:,1] > 0.9].mean() - X_total.mean()









    



term : Ttest_indResult(statistic=-111.1274594789467, pvalue=0.0)
int_rate : Ttest_indResult(statistic=-252.55245382254896, pvalue=0.0)
installment : Ttest_indResult(statistic=-18.291381247594764, pvalue=1.058913389464325e-74)
grade : Ttest_indResult(statistic=-228.05864402831577, pvalue=0.0)
sub_grade : Ttest_indResult(statistic=-232.6328655787587, pvalue=0.0)
emp_length : Ttest_indResult(statistic=26.186558411268546, pvalue=5.4744312026930184e-151)
home_ownership : Ttest_indResult(statistic=54.644952956066348, pvalue=0.0)
annual_inc : Ttest_indResult(statistic=132.54099142904784, pvalue=0.0)
purpose : Ttest_indResult(statistic=-2.8004280871792182, pvalue=0.0051037954727872238)
zip_code : Ttest_indResult(statistic=15.477599274683396, pvalue=5.1433161223269419e-54)
addr_state : Ttest_indResult(statistic=-6.1629362457205694, pvalue=7.1493494476485885e-10)
delinq_2yrs : Ttest_indResult(statistic=-20.014038201633305, pvalue=4.7168489070678436e-89)
inq_last_6mths : Ttest_indResult(statistic=-53.264927971053702, pvalue=0.0)
open_acc : Ttest_indResult(statistic=3.9468393246616409, pvalue=7.9206665281908095e-05)
pub_rec : Ttest_indResult(statistic=-15.482253872746803, pvalue=4.7845952665859297e-54)
revol_bal : Ttest_indResult(statistic=20.188025528349062, pvalue=1.4222443102909632e-90)
revol_util : Ttest_indResult(statistic=-123.34649242750646, pvalue=0.0)
total_acc : Ttest_indResult(statistic=56.881084046760428, pvalue=0.0)
acc_now_delinq : Ttest_indResult(statistic=-4.006449999190246, pvalue=6.1652173430596095e-05)
loan_amnt : Ttest_indResult(statistic=-23.345322420078318, pvalue=1.9429590487125806e-120)
dti : Ttest_indResult(statistic=-124.50997385851153, pvalue=0.0)
days_since_first_credit_line : Ttest_indResult(statistic=41.905153364232731, pvalue=0.0)



In [418]:

    
X_total[y_score[:,1] > 0.9]['term']









    Out[418]:





7       -0.534690
9       -0.534690
10      -0.534690
18      -0.534690
28      -0.534690
29      -0.534690
32      -0.534690
39      -0.534690
51      -0.534690
52      -0.534690
54      -0.534690
56      -0.534690
57      -0.534690
64       1.870241
66      -0.534690
73      -0.534690
74      -0.534690
79      -0.534690
80      -0.534690
87      -0.534690
90      -0.534690
92      -0.534690
109     -0.534690
110     -0.534690
114     -0.534690
127     -0.534690
130     -0.534690
141      1.870241
143     -0.534690
147     -0.534690
           ...   
75737   -0.534690
75738   -0.534690
75741   -0.534690
75743   -0.534690
75745   -0.534690
75760   -0.534690
75765   -0.534690
75768   -0.534690
75769   -0.534690
75773   -0.534690
75779   -0.534690
75781   -0.534690
75782   -0.534690
75785   -0.534690
75786   -0.534690
75789   -0.534690
75790   -0.534690
75793   -0.534690
75796   -0.534690
75798   -0.534690
75802   -0.534690
75804   -0.534690
75806   -0.534690
75815   -0.534690
75821   -0.534690
75824   -0.534690
75825   -0.534690
75826   -0.534690
75827   -0.534690
75828   -0.534690
Name: term, dtype: float64



In [320]:

    
X_total.mean()









    Out[320]:





term                            0.000735
int_rate                        0.000869
installment                     0.000411
grade                           0.001088
sub_grade                       0.001040
emp_length                      0.000231
home_ownership                  0.000497
annual_inc                      0.000595
purpose                        -0.000405
zip_code                       -0.000062
addr_state                      0.000371
delinq_2yrs                     0.001245
inq_last_6mths                 -0.000849
open_acc                       -0.001960
pub_rec                        -0.000001
revol_bal                       0.001173
revol_util                      0.001154
total_acc                      -0.002846
acc_now_delinq                 -0.000471
loan_amnt                       0.000421
dti                            -0.000668
days_since_first_credit_line   -0.000040
dtype: float64



In [309]:

    
X_total #most interesting features: 'int_rate', 'annual_inc', 'sub_grade', 'term', 'dti'
# vergelijken predicted > 0.9 vs. all?









    Out[309]:






  
    
      
      term
      int_rate
      installment
      grade
      sub_grade
      emp_length
      home_ownership
      annual_inc
      purpose
      zip_code
      ...
      inq_last_6mths
      open_acc
      pub_rec
      revol_bal
      revol_util
      total_acc
      acc_now_delinq
      loan_amnt
      dti
      days_since_first_credit_line
    
  
  
    
      0
      -0.534690
      -0.401743
      1.005720
      -0.595481
      -0.601032
      -1.503256
      -1.055764
      0.437807
      0.538410
      1.171976
      ...
      -0.800122
      1.851360
      -0.329101
      0.556035
      0.407422
      1.183897
      -0.051363
      0.792242
      1.453198
      -0.058769
    
    
      1
      1.870241
      0.865487
      0.347860
      0.900654
      0.988284
      -1.503256
      0.528760
      0.519080
      0.538410
      -0.778588
      ...
      -0.800122
      1.442024
      -0.329101
      1.029208
      0.592675
      1.353932
      -0.051363
      0.792242
      0.796525
      -0.262228
    
    
      2
      -0.534690
      -0.092884
      -0.352335
      0.152587
      0.080104
      0.643787
      -1.055764
      0.681624
      0.288887
      -0.894062
      ...
      3.886942
      -0.809321
      -0.329101
      -0.174204
      0.834310
      -0.091366
      -0.051363
      -0.462926
      0.648741
      -0.071656
    
    
      3
      1.870241
      0.554357
      -0.710339
      0.152587
      0.231467
      -1.503256
      -1.055764
      -0.781278
      0.288887
      -1.324746
      ...
      0.137291
      3.693369
      -0.329101
      -0.510631
      -0.095984
      1.013862
      -0.051363
      -0.438315
      -0.115878
      -1.152993
    
    
      4
      -0.534690
      0.863216
      -0.427319
      0.900654
      0.761239
      -0.698114
      -1.055764
      -0.510370
      0.538410
      1.171976
      ...
      0.137291
      0.828021
      -0.329101
      -0.525372
      -0.595362
      0.588774
      -0.051363
      -0.595211
      0.043471
      -0.403985
    
    
      5
      1.870241
      0.574796
      0.291632
      0.152587
      0.231467
      -1.503256
      -1.055764
      0.275262
      0.538410
      0.023484
      ...
      0.137291
      0.214017
      -0.329101
      -0.374646
      -0.325537
      2.204107
      -0.051363
      0.792242
      0.057607
      -0.368838
    
    
      6
      1.870241
      0.910907
      -0.056320
      0.900654
      0.761239
      1.180548
      -1.055764
      -0.618734
      0.538410
      -1.418373
      ...
      0.137291
      -0.809321
      1.963812
      -0.152393
      0.274523
      -0.601471
      -0.051363
      0.300019
      -1.060407
      -0.047835
    
    
      7
      -0.534690
      -1.103488
      -1.059640
      -1.343549
      -1.206485
      -0.698114
      -1.055764
      -0.700006
      0.538410
      -1.312263
      ...
      -0.800122
      -0.604654
      -0.329101
      -0.534129
      -0.196665
      -0.006348
      -0.051363
      -1.053594
      -1.310997
      -1.176814
    
    
      8
      -0.534690
      0.463516
      0.798665
      0.152587
      0.155785
      1.180548
      0.528760
      -0.158190
      0.288887
      0.248189
      ...
      0.137291
      -0.195318
      -0.329101
      0.499356
      1.261198
      -0.261401
      -0.051363
      0.484603
      -0.086321
      -0.416872
    
    
      9
      -0.534690
      -1.103488
      1.535473
      -1.343549
      -1.206485
      1.180548
      0.528760
      0.789987
      0.538410
      1.421648
      ...
      -0.800122
      0.418685
      -0.329101
      0.841766
      -0.168474
      0.673792
      -0.051363
      1.407521
      -0.637618
      0.451244
    
    
      10
      -0.534690
      -1.394179
      -0.912572
      -1.343549
      -1.357849
      -0.966495
      -1.055764
      0.925441
      -2.705391
      1.299933
      ...
      -0.800122
      -0.604654
      -0.329101
      -0.586511
      -1.199449
      -1.366628
      -0.051363
      -0.899774
      -1.083539
      -1.509924
    
    
      11
      -0.534690
      1.528625
      3.642595
      1.648722
      1.593738
      1.180548
      0.528760
      1.331803
      0.538410
      1.468462
      ...
      1.074704
      0.828021
      -0.329101
      1.557155
      1.434369
      1.013862
      -0.051363
      2.638079
      0.228522
      1.211968
    
    
      12
      -0.534690
      0.420367
      -1.079745
      0.152587
      0.307149
      1.180548
      0.528760
      -0.672915
      0.538410
      -1.199910
      ...
      -0.800122
      -0.809321
      -0.329101
      -0.762584
      0.854446
      -0.771506
      -0.051363
      -1.127428
      0.431564
      -0.333301
    
    
      13
      -0.534690
      0.352236
      -1.053960
      0.152587
      0.080104
      1.180548
      -1.055764
      -0.618734
      -0.709206
      1.196943
      ...
      -0.800122
      -1.832660
      -0.329101
      -0.574381
      0.657111
      -0.686488
      -0.051363
      -1.102816
      -1.669533
      -0.559019
    
    
      14
      -0.534690
      -1.330590
      -1.005169
      -1.343549
      -1.282167
      -1.503256
      -1.055764
      -0.889641
      0.288887
      -0.572609
      ...
      -0.800122
      -1.013989
      1.963812
      -0.623281
      -1.098768
      -1.026558
      -0.051363
      -0.992066
      1.254011
      -0.570344
    
    
      15
      -0.534690
      0.352236
      3.271228
      0.152587
      0.080104
      1.180548
      0.528760
      0.654533
      0.538410
      -0.909666
      ...
      -0.800122
      1.032689
      -0.329101
      0.529545
      0.040942
      1.864037
      -0.051363
      2.638079
      0.634605
      -0.356342
    
    
      16
      -0.534690
      -0.158743
      1.047646
      0.152587
      0.080104
      -0.698114
      -1.055764
      0.248172
      0.538410
      -1.315383
      ...
      0.137291
      0.214017
      -0.329101
      -0.527221
      -0.853106
      -0.431436
      -0.051363
      0.792242
      -0.691591
      -1.188920
    
    
      17
      -0.534690
      2.121361
      -0.758027
      1.648722
      1.745101
      0.375407
      0.528760
      -0.970914
      -1.208252
      -0.488344
      ...
      -0.800122
      0.418685
      1.963812
      -0.652382
      -1.779372
      -0.006348
      -0.051363
      -0.930538
      0.234947
      -0.071656
    
    
      18
      -0.534690
      -1.423702
      -0.818750
      -1.343549
      -1.282167
      1.180548
      -1.055764
      -0.510370
      0.538410
      1.234394
      ...
      -0.800122
      0.214017
      1.963812
      -0.521129
      -0.361782
      -0.601471
      -0.051363
      -0.807483
      0.301771
      0.605497
    
    
      19
      1.870241
      -0.147388
      0.321176
      -0.595481
      -0.525350
      1.180548
      0.528760
      -0.158190
      0.538410
      -0.113836
      ...
      -0.800122
      0.009350
      -0.329101
      1.109711
      0.568511
      0.078669
      -0.051363
      1.010666
      1.360672
      -0.403985
    
    
      20
      -0.534690
      -0.594779
      0.168347
      -0.595481
      -0.676713
      1.180548
      0.528760
      -0.239463
      0.288887
      -0.544520
      ...
      -0.800122
      -0.604654
      -0.329101
      0.835184
      0.878609
      -0.261401
      -0.051363
      0.053908
      -0.369038
      0.070882
    
    
      21
      -0.534690
      -0.628845
      1.234350
      -0.595481
      -0.676713
      -0.698114
      -1.055764
      -0.185281
      0.538410
      -1.574418
      ...
      -0.800122
      0.214017
      -0.329101
      0.767899
      -0.152365
      -0.006348
      -0.051363
      1.038354
      0.497102
      0.724995
    
    
      22
      1.870241
      0.960870
      1.403893
      1.648722
      1.518056
      0.375407
      -1.055764
      -0.347826
      0.538410
      0.703841
      ...
      -0.800122
      -1.013989
      -0.329101
      -0.303825
      -1.259858
      -1.111576
      -0.051363
      2.022800
      -0.271372
      -0.559410
    
    
      23
      -0.534690
      -0.372219
      -0.338605
      -0.595481
      -0.601032
      1.180548
      2.113283
      -1.052186
      0.288887
      -1.196789
      ...
      -0.800122
      0.009350
      -0.329101
      0.475205
      0.858473
      -0.431436
      -0.051363
      -0.429086
      1.636963
      0.832387
    
    
      24
      -0.534690
      -0.372219
      0.303809
      -0.595481
      -0.601032
      0.107027
      0.528760
      -0.564552
      0.538410
      1.427890
      ...
      -0.800122
      0.009350
      -0.329101
      -0.135476
      -0.486627
      -0.686488
      -0.051363
      0.152352
      0.001064
      -1.188530
    
    
      25
      -0.534690
      0.574796
      -0.864558
      0.900654
      0.836921
      -0.698114
      -1.055764
      -0.943823
      0.538410
      -0.606939
      ...
      2.012117
      3.898037
      1.963812
      -0.447643
      -1.074604
      2.714212
      -0.051363
      -0.948997
      -0.730143
      0.498887
    
    
      26
      -0.534690
      0.129676
      -0.726194
      0.152587
      0.004422
      -0.698114
      -1.055764
      -0.483280
      0.538410
      1.203185
      ...
      -0.800122
      -0.195318
      -0.329101
      -0.381989
      0.496021
      -1.026558
      -0.051363
      -0.807483
      -0.049054
      -1.117847
    
    
      27
      1.870241
      1.051710
      -1.143369
      1.648722
      1.593738
      -0.966495
      -1.055764
      -1.404366
      -2.705391
      0.351179
      ...
      0.137291
      -0.809321
      -0.329101
      -0.724998
      -0.305401
      -0.771506
      -0.051363
      -1.004372
      -0.933185
      -1.189311
    
    
      28
      -0.534690
      -0.712872
      -0.803671
      -0.595481
      -0.601032
      -1.503256
      2.113283
      -0.700006
      0.538410
      -1.271691
      ...
      2.012117
      1.237356
      -0.329101
      -0.803815
      -1.936435
      -0.346418
      -0.051363
      -0.832094
      -1.280155
      -1.272491
    
    
      29
      -0.534690
      -0.515294
      -0.900027
      -0.595481
      -0.525350
      -0.429734
      -1.055764
      -0.022736
      0.538410
      -1.321625
      ...
      0.137291
      0.214017
      1.963812
      -0.633779
      -0.671880
      0.078669
      -0.051363
      -0.930538
      -1.836592
      0.558635
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      75802
      -0.534690
      -1.469122
      -0.796030
      -1.343549
      -1.282167
      -0.698114
      0.528760
      -0.185281
      0.288887
      -0.538279
      ...
      0.137291
      0.828021
      -0.329101
      -0.011675
      -1.465247
      -0.091366
      -0.051363
      -0.782871
      1.618972
      -0.510595
    
    
      75803
      -0.534690
      0.129676
      -0.656032
      0.152587
      0.004422
      1.180548
      2.113283
      0.383626
      -2.705391
      -1.284174
      ...
      -0.800122
      -0.399986
      -0.329101
      0.254855
      1.007481
      -0.006348
      -0.051363
      -0.745955
      1.062535
      1.854365
    
    
      75804
      -0.534690
      -0.501668
      -0.359936
      -0.595481
      -0.601032
      0.643787
      0.528760
      -0.374916
      0.288887
      1.278087
      ...
      -0.800122
      0.009350
      -0.329101
      -0.134823
      -0.216801
      -1.026558
      -0.051363
      -0.438315
      -0.989728
      -1.271710
    
    
      75805
      1.870241
      1.296981
      0.432815
      0.900654
      1.063966
      0.375407
      0.528760
      2.821797
      0.538410
      -0.135682
      ...
      -0.800122
      0.828021
      -0.329101
      2.006014
      1.531023
      0.928844
      -0.051363
      0.792242
      0.282495
      1.794616
    
    
      75806
      -0.534690
      -0.288192
      0.478541
      -0.595481
      -0.449668
      0.643787
      -1.055764
      2.009073
      0.288887
      1.249999
      ...
      0.137291
      1.032689
      -0.329101
      0.289287
      0.854446
      0.928844
      -0.051363
      0.300019
      0.084594
      0.807784
    
    
      75807
      -0.534690
      0.463516
      0.010736
      0.152587
      0.155785
      -0.429734
      2.113283
      -0.835460
      -2.705391
      -1.100041
      ...
      2.012117
      -1.423325
      -0.329101
      -0.613490
      0.870555
      -1.111576
      -0.051363
      -0.192204
      -1.594998
      -0.962812
    
    
      75808
      -0.534690
      -0.685620
      0.957174
      -0.595481
      -0.676713
      -1.234875
      -1.055764
      -0.347826
      0.538410
      -1.324746
      ...
      -0.800122
      -0.195318
      -0.329101
      0.467481
      -0.442327
      -0.856523
      -0.051363
      0.792242
      -0.228964
      1.176040
    
    
      75809
      1.870241
      1.233392
      0.702227
      1.648722
      1.669419
      -0.698114
      0.528760
      2.632161
      0.538410
      0.766259
      ...
      1.074704
      -0.399986
      -0.329101
      -0.381663
      -0.547035
      0.418739
      -0.051363
      1.118340
      -0.507826
      2.210125
    
    
      75810
      -0.534690
      -0.022482
      -1.082973
      0.152587
      0.155785
      1.180548
      2.113283
      -1.052186
      0.538410
      -0.191858
      ...
      1.074704
      0.214017
      4.256724
      0.411020
      1.128298
      0.588774
      -0.051363
      -1.115122
      0.799095
      0.713280
    
    
      75811
      -0.534690
      -0.501668
      -0.090237
      -0.595481
      -0.601032
      -0.966495
      -1.055764
      -0.564552
      0.538410
      -1.031381
      ...
      0.137291
      -0.399986
      -0.329101
      -0.626925
      -1.529683
      0.588774
      -0.051363
      -0.192204
      1.293848
      -0.439522
    
    
      75812
      1.870241
      0.563441
      -0.534668
      0.152587
      0.307149
      -1.234875
      -1.055764
      -0.700006
      0.538410
      1.431011
      ...
      -0.800122
      -0.809321
      -0.329101
      0.427230
      0.508103
      -0.771506
      -0.051363
      -0.222968
      -0.127444
      -0.355951
    
    
      75813
      -0.534690
      -0.147388
      -0.880985
      -0.595481
      -0.525350
      0.107027
      2.113283
      -0.239463
      0.288887
      -0.606939
      ...
      0.137291
      -0.399986
      -0.329101
      -0.510359
      0.681274
      -0.346418
      -0.051363
      -0.930538
      -0.943466
      0.082988
    
    
      75814
      1.870241
      0.166012
      -0.554936
      0.152587
      0.231467
      0.107027
      -1.055764
      -0.781278
      0.288887
      1.440374
      ...
      -0.800122
      0.009350
      -0.329101
      -0.290172
      0.636975
      -1.111576
      -0.051363
      -0.192204
      0.579347
      -0.523482
    
    
      75815
      -0.534690
      -0.853676
      0.928692
      -0.595481
      -0.676713
      -0.698114
      -1.055764
      1.873619
      0.538410
      1.299933
      ...
      -0.800122
      -1.013989
      -0.329101
      -0.747354
      -1.980734
      -0.006348
      -0.051363
      0.792242
      -1.452355
      -0.463734
    
    
      75816
      -0.534690
      0.052461
      0.051069
      0.152587
      0.231467
      1.180548
      0.528760
      -0.293644
      0.538410
      1.321779
      ...
      0.137291
      0.623353
      4.256724
      -0.370621
      0.310768
      1.353932
      -0.051363
      -0.118370
      0.137282
      1.176040
    
    
      75817
      -0.534690
      1.133467
      -0.812784
      0.900654
      0.912602
      1.180548
      0.528760
      0.681624
      0.538410
      1.212548
      ...
      0.137291
      0.418685
      1.963812
      -0.328792
      0.592675
      2.374142
      -0.051363
      -0.930538
      1.558574
      -0.297374
    
    
      75818
      1.870241
      0.554357
      0.287709
      0.152587
      0.231467
      0.912168
      0.528760
      2.144527
      0.538410
      1.212548
      ...
      1.074704
      1.032689
      -0.329101
      2.494527
      0.057051
      0.673792
      -0.051363
      0.792242
      -0.136439
      2.995451
    
    
      75819
      1.870241
      -0.855947
      0.424765
      -0.595481
      -0.752395
      -0.966495
      0.528760
      0.112718
      0.538410
      1.171976
      ...
      -0.800122
      0.214017
      -0.329101
      0.462368
      -0.325537
      0.163687
      -0.051363
      1.355222
      -0.538667
      3.066525
    
    
      75820
      -0.534690
      0.865487
      -1.267839
      0.900654
      0.988284
      -0.698114
      2.113283
      -0.320735
      0.538410
      1.312417
      ...
      -0.800122
      1.237356
      -0.329101
      0.103042
      0.238277
      -0.346418
      -0.051363
      -1.299706
      -0.380603
      -1.510315
    
    
      75821
      -0.534690
      -0.401743
      -1.029891
      -0.595481
      -0.601032
      1.180548
      0.528760
      -0.158190
      0.538410
      -1.362197
      ...
      -0.800122
      -1.218657
      -0.329101
      -0.639110
      0.669193
      1.438949
      -0.051363
      -1.053594
      -1.199195
      0.440309
    
    
      75822
      -0.534690
      0.733768
      -0.543045
      0.900654
      0.912602
      -1.234875
      -1.055764
      -0.943823
      -1.208252
      1.274966
      ...
      -0.800122
      4.307373
      -0.329101
      -0.605549
      0.467830
      1.183897
      -0.051363
      -0.684427
      1.004707
      -0.820665
    
    
      75823
      -0.534690
      -0.147388
      -0.715570
      0.152587
      0.080104
      0.107027
      -1.055764
      -1.106368
      0.538410
      1.281208
      ...
      -0.800122
      -1.423325
      1.963812
      -0.461949
      1.043726
      -1.281611
      -0.051363
      -0.782871
      -0.883067
      -0.498489
    
    
      75824
      -0.534690
      -0.819610
      -1.245895
      -0.595481
      -0.676713
      -0.966495
      -1.055764
      -0.158190
      0.538410
      -0.981447
      ...
      0.137291
      -1.218657
      -0.329101
      -0.559695
      -1.296103
      -0.771506
      -0.051363
      -1.238178
      -0.560514
      0.867533
    
    
      75825
      -0.534690
      -1.330590
      1.871738
      -1.343549
      -1.282167
      -0.161354
      0.528760
      0.519080
      -2.455868
      0.248189
      ...
      2.012117
      0.009350
      -0.329101
      -0.505518
      -1.747154
      -0.091366
      -0.051363
      1.776688
      -1.071973
      0.070882
    
    
      75826
      -0.534690
      -1.553150
      1.820291
      -1.343549
      -1.282167
      1.180548
      0.528760
      3.499066
      -1.956821
      -1.424615
      ...
      0.137291
      1.442024
      -0.329101
      0.618370
      -1.259858
      0.673792
      -0.051363
      1.776688
      -1.561586
      -0.131796
    
    
      75827
      -0.534690
      -1.755271
      1.400828
      -1.343549
      -1.509212
      1.180548
      0.528760
      0.735806
      0.288887
      -1.199910
      ...
      0.137291
      2.465363
      -0.329101
      0.773991
      -1.396784
      1.098879
      -0.051363
      1.407521
      0.871059
      0.807784
    
    
      75828
      -0.534690
      -0.288192
      0.341853
      -0.595481
      -0.449668
      0.643787
      0.528760
      1.602711
      0.538410
      -0.688082
      ...
      -0.800122
      -0.195318
      -0.329101
      3.247666
      0.512130
      -0.856523
      -0.051363
      0.176963
      0.075598
      -0.439912
    
    
      75829
      1.870241
      1.867007
      1.247304
      1.648722
      1.745101
      1.180548
      -1.055764
      1.087986
      0.538410
      -1.009535
      ...
      0.137291
      1.032689
      1.963812
      -0.445141
      -0.236937
      1.949055
      -0.051363
      1.555188
      -0.241815
      -0.284878
    
    
      75830
      -0.534690
      1.017645
      -0.077937
      0.900654
      1.063966
      0.912168
      -1.055764
      -0.970914
      -2.705391
      -0.606939
      ...
      0.137291
      -0.399986
      1.963812
      -0.446446
      -1.070577
      -0.601471
      -0.051363
      -0.315260
      0.826081
      -0.570344
    
    
      75831
      -0.534690
      -0.769648
      -0.647900
      -0.595481
      -0.601032
      -0.698114
      2.113283
      -1.268913
      0.538410
      1.434132
      ...
      -0.800122
      -0.399986
      1.963812
      -0.141459
      0.173841
      -1.026558
      -0.051363
      -0.684427
      -0.225109
      -1.213132
    
  

252771 rows × 22 columns



In [ ]:



In [ ]:



In [40]:

    
sum(y_score[:,1] > 0.5) / len(y_score[:,1] )









    Out[40]:





0.014716617133322906



In [41]:

    
max(y_score[~prediction, 1])









    Out[41]:





0.49995529304248365



In [74]:

    
diff_thres = y_score[:,1] > 0.18



In [79]:

    
print(f1_score(y_test, diff_thres, average='weighted'))
confusion_matrix = ConfusionMatrix(y_test, diff_thres)
confusion_matrix.print_stats()
confusion_matrix.plot()









    



0.686387709202
Confusion Matrix:

Predicted  False  True  __all__
Actual                         
False       8290  5751    14041
True        1677  1168     2845
__all__     9967  6919    16886


Overall Statistics:

Accuracy: 0.560108966007
95% CI: (0.55258278403993444, 0.56761443612344797)
No Information Rate: ToDo
P-Value [Acc > NIR]: 1.0
Kappa: 0.000610218839777
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                      False        True 
Population                                   16886        16886
P: Condition positive                        14041         2845
N: Condition negative                         2845        14041
Test outcome positive                         9967         6919
Test outcome negative                         6919         9967
TP: True Positive                             8290         1168
TN: True Negative                             1168         8290
FP: False Positive                            1677         5751
FN: False Negative                            5751         1677
TPR: (Sensitivity, hit rate, recall)      0.590414     0.410545
TNR=SPC: (Specificity)                    0.410545     0.590414
PPV: Pos Pred Value (Precision)           0.831745     0.168811
NPV: Neg Pred Value                       0.168811     0.831745
FPR: False-out                            0.589455     0.409586
FDR: False Discovery Rate                 0.168255     0.831189
FNR: Miss Rate                            0.409586     0.589455
ACC: Accuracy                             0.560109     0.560109
F1 score                                  0.690603     0.239246
MCC: Matthews correlation coefficient  0.000729584  0.000729584
Informedness                           0.000958604  0.000958604
Markedness                             0.000555279  0.000555279
Prevalence                                0.831517     0.168483
LR+: Positive likelihood ratio             1.00163      1.00234
LR-: Negative likelihood ratio            0.997665     0.998376
DOR: Diagnostic odds ratio                 1.00397      1.00397
FOR: False omission rate                  0.831189     0.168255






    Out[79]:





<matplotlib.axes._subplots.AxesSubplot at 0x124f6da90>



In [ ]:



In [ ]:

Random Forest

Also with the random forest algorithm for both only grade and all features we find the same accuracy measurements as measured with cross-validation on the training set. Therefore the logistic regression algorithm with all features still performance the best, although it performs not very well.



In [46]:

    
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train.loc[:,['grade']], y_train)
prediction = clf.predict(X_test.loc[:,['grade']])
print(f1_score(y_test, prediction, average='weighted'))
confusion_matrix = ConfusionMatrix(y_test, prediction)
print(confusion_matrix)
confusion_matrix.plot()









    



0.739008115086
Predicted  False  True  __all__
Actual                         
False      14041     0    14041
True        2845     0     2845
__all__    16886     0    16886






    



/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)






    Out[46]:





<matplotlib.axes._subplots.AxesSubplot at 0x121651198>



In [47]:

    
fpr, tpr, thresholds = roc_curve(y_test, prediction, pos_label=True)
print(auc(fpr, tpr))
plt.plot(fpr, tpr)









    



0.5






    Out[47]:





[<matplotlib.lines.Line2D at 0x126add518>]



In [48]:

    
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
prediction = clf.predict(X_test)
print(f1_score(y_test, prediction, average='weighted'))
confusion_matrix = ConfusionMatrix(y_test, prediction)
print(confusion_matrix)
confusion_matrix.plot()









    



0.749727275393
Predicted  False  True  __all__
Actual                         
False      13865   176    14041
True        2806    39     2845
__all__    16671   215    16886






    Out[48]:





<matplotlib.axes._subplots.AxesSubplot at 0x126445c18>



In [49]:

    
y_score = clf.predict_proba(X_test)
print(clf.classes_)
fpr, tpr, thresholds = roc_curve(y_test, y_score[:,1], pos_label=True)
print(auc(fpr, tpr))
plt.plot(fpr, tpr)









    



[False  True]
0.694945786794






    Out[49]:





[<matplotlib.lines.Line2D at 0x123830978>]

Predict grade



In [50]:

    
X_train, X_test, y_train, y_test = train_test_split(closed_loans.iloc[:, 0:24], 
                                                    closed_loans['grade'], test_size=0.3, 
                                                    random_state=123, stratify=closed_loans['loan_status'])
X_train = X_train.drop(['grade', 'sub_grade', 'int_rate'], axis=1)
X_test = X_test.drop(['grade', 'sub_grade', 'int_rate'], axis=1)



In [51]:

    
# features that are not float or int, so not to be converted:

# date:
# earliest_cr_line

# ordered:
# emp_length, zip_code, term

# unordered:
# home_ownership, purpose, addr_state (ordered geographically)

# date
X_train['earliest_cr_line'] = pd.to_datetime(X_train['earliest_cr_line']).dt.strftime("%s")
X_train['earliest_cr_line'] = [0 if date=='NaT' else int(date) for date in X_train['earliest_cr_line']]

# term
X_train['term'] = X_train['term'].apply(lambda x: int(x.split(' ')[1]))

# emp_length
emp_length_dict = {'n/a':0,
                   '< 1 year':0,
                   '1 year':1,
                   '2 years':2,
                   '3 years':3,
                   '4 years':4,
                   '5 years':5,
                   '6 years':6,
                   '7 years':7,
                   '8 years':8,
                   '9 years':9,
                   '10+ years':10}
X_train['emp_length'] = X_train['emp_length'].apply(lambda x: emp_length_dict[x])

# zipcode
X_train['zip_code'] = X_train['zip_code'].apply(lambda x: int(x[0:3]))

# house
house_dict = {'NONE': 0, 'OTHER': 0, 'ANY': 0, 'RENT': 1, 'MORTGAGE': 2, 'OWN': 3}
X_train['home_ownership'] = X_train['home_ownership'].apply(lambda x: house_dict[x])

# purpose
purpose_dict = {'other': 0, 'small_business': 1, 'renewable_energy': 2, 'home_improvement': 3,
                'house': 4, 'educational': 5, 'medical': 6, 'moving': 7, 'car': 8, 
                'major_purchase': 9, 'wedding': 10, 'vacation': 11, 'credit_card': 12, 
                'debt_consolidation': 13}
X_train['purpose'] = X_train['purpose'].apply(lambda x: purpose_dict[x])

# states
state_dict = {'AK': 0, 'WA': 1, 'ID': 2, 'MT': 3, 'ND': 4, 'MN': 5, 
              'OR': 6, 'WY': 7, 'SD': 8, 'WI': 9, 'MI': 10, 'NY': 11, 
              'VT': 12, 'NH': 13, 'MA': 14, 'CT': 15, 'RI': 16, 'ME': 17,
              'CA': 18, 'NV': 19, 'UT': 20, 'CO': 21, 'NE': 22, 'IA': 23, 
              'KS': 24, 'MO': 25, 'IL': 26, 'IN': 27, 'OH': 28, 'PA': 29, 
              'NJ': 30, 'KY': 31, 'WV': 32, 'VA': 33, 'DC': 34, 'MD': 35, 
              'DE': 36, 'AZ': 37, 'NM': 38, 'OK': 39, 'AR': 40, 'TN': 41, 
              'NC': 42, 'TX': 43, 'LA': 44, 'MS': 45, 'AL': 46, 'GA': 47, 
              'SC': 48, 'FL': 49, 'HI': 50}
X_train['addr_state'] = X_train['addr_state'].apply(lambda x: state_dict[x])

# make NA's, inf and -inf 0
X_train = X_train.fillna(0)
X_train = X_train.replace([np.inf, -np.inf], 0)


# date
X_test['earliest_cr_line'] = pd.to_datetime(X_test['earliest_cr_line']).dt.strftime("%s")
X_test['earliest_cr_line'] = [0 if date=='NaT' else int(date) for date in X_test['earliest_cr_line']]

# term
X_test['term'] = X_test['term'].apply(lambda x: int(x.split(' ')[1]))

# emp_length
X_test['emp_length'] = X_test['emp_length'].apply(lambda x: emp_length_dict[x])

# zipcode
X_test['zip_code'] = X_test['zip_code'].apply(lambda x: int(x[0:3]))

# house
X_test['home_ownership'] = X_test['home_ownership'].apply(lambda x: house_dict[x])

# purpose
X_test['purpose'] = X_test['purpose'].apply(lambda x: purpose_dict[x])

# states
X_test['addr_state'] = X_test['addr_state'].apply(lambda x: state_dict[x])

# make NA's, inf and -inf 0
X_test = X_test.fillna(0)
X_test = X_test.replace([np.inf, -np.inf], 0)



In [52]:

    
from sklearn import preprocessing
X_train_scaled = preprocessing.scale(X_train)
scaler = preprocessing.StandardScaler().fit(X_train)
X_train_scaled = pd.DataFrame(X_train_scaled, columns=X_train.columns)
X_test_scaled = scaler.transform(X_test)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=X_test.columns)









    



/Users/ro.d.bruijn/anaconda/lib/python3.5/site-packages/sklearn/preprocessing/data.py:160: UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.
  warnings.warn("Numerical issues were encountered "



In [53]:

    
from sklearn.preprocessing import LabelBinarizer
from sklearn.multiclass import OneVsRestClassifier

lb = LabelBinarizer()
grades = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
lb.fit(grades)
y_train_2 = lb.transform(y_train)

clf = OneVsRestClassifier(LogisticRegression(penalty='l1'))
predict_y = clf.fit(X_train_scaled, y_train_2).predict(X_test_scaled)
predict_y = lb.inverse_transform(predict_y)

#print(accuracy_score(y_test, predict_y))
confusion_matrix = ConfusionMatrix(np.array(y_test, dtype='<U1'), predict_y)
confusion_matrix.plot()
confusion_matrix.print_stats()

# find index of top 5 highest coefficients, aka most used features for prediction
coefs = clf.coef_
positions = abs(coefs[0]).argsort()[-5:][::-1]
print(X_train_scaled.columns[positions])
print(coefs[0][positions])









    



Confusion Matrix:

Predicted      A   B    C     D    E  F  G  __all__
Actual                                             
A          12769  17    2     0    0  0  0    12788
B          22809  25   21     7    0  0  0    22862
C          19651  31   53    37    0  0  0    19772
D          12102   1   49   101    2  0  0    12255
E           5636   0   22   282   14  5  2     5961
F           1904   0    6   379  114  3  1     2407
G            370   0    9   201   91  0  0      671
__all__    75241  74  162  1007  221  8  3    76716


Overall Statistics:

Accuracy: 0.16899994786
95% CI: (0.16635413333328417, 0.17167081891471681)
No Information Rate: ToDo
P-Value [Acc > NIR]: 1.0
Kappa: 0.0028273004168
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                         A            B            C  \
Population                                  76716        76716        76716   
P: Condition positive                       12788        22862        19772   
N: Condition negative                       63928        53854        56944   
Test outcome positive                       75241           74          162   
Test outcome negative                        1475        76642        76554   
TP: True Positive                           12769           25           53   
TN: True Negative                            1456        53805        56835   
FP: False Positive                          62472           49          109   
FN: False Negative                             19        22837        19719   
TPR: (Sensitivity, hit rate, recall)     0.998514   0.00109352   0.00268056   
TNR=SPC: (Specificity)                  0.0227756      0.99909     0.998086   
PPV: Pos Pred Value (Precision)          0.169708     0.337838      0.32716   
NPV: Neg Pred Value                      0.987119      0.70203     0.742417   
FPR: False-out                           0.977224  0.000909867   0.00191416   
FDR: False Discovery Rate                0.830292     0.662162      0.67284   
FNR: Miss Rate                         0.00148577     0.998906     0.997319   
ACC: Accuracy                            0.185424     0.701679      0.74154   
F1 score                                 0.290109   0.00217998   0.00531755   
MCC: Matthews correlation coefficient   0.0577825   0.00270588   0.00730233   
Informedness                            0.0212899   0.00018365  0.000766397   
Markedness                               0.156827    0.0398681    0.0695776   
Prevalence                               0.166693     0.298008      0.25773   
LR+: Positive likelihood ratio            1.02179      1.20184      1.40038   
LR-: Negative likelihood ratio           0.065235     0.999816     0.999232   
DOR: Diagnostic odds ratio                15.6632      1.20206      1.40146   
FOR: False omission rate                0.0128814      0.29797     0.257583   

Classes                                         D            E            F  \
Population                                  76716        76716        76716   
P: Condition positive                       12255         5961         2407   
N: Condition negative                       64461        70755        74309   
Test outcome positive                        1007          221            8   
Test outcome negative                       75709        76495        76708   
TP: True Positive                             101           14            3   
TN: True Negative                           63555        70548        74304   
FP: False Positive                            906          207            5   
FN: False Negative                          12154         5947         2404   
TPR: (Sensitivity, hit rate, recall)   0.00824153    0.0023486   0.00124636   
TNR=SPC: (Specificity)                   0.985945     0.997074     0.999933   
PPV: Pos Pred Value (Precision)          0.100298    0.0633484        0.375   
NPV: Neg Pred Value                      0.839464     0.922256      0.96866   
FPR: False-out                           0.014055   0.00292559  6.72866e-05   
FDR: False Discovery Rate                0.899702     0.936652        0.625   
FNR: Miss Rate                           0.991758     0.997651     0.998754   
ACC: Accuracy                            0.829762     0.919782     0.968598   
F1 score                                0.0152315   0.00452928   0.00248447   
MCC: Matthews correlation coefficient  -0.0187134  -0.00288199    0.0201296   
Informedness                          -0.00581348 -0.000576989   0.00117908   
Markedness                             -0.0602378   -0.0143952      0.34366   
Prevalence                               0.159745    0.0777022    0.0313755   
LR+: Positive likelihood ratio           0.586377     0.802778      18.5232   
LR-: Negative likelihood ratio             1.0059      1.00058     0.998821   
DOR: Diagnostic odds ratio                0.58294     0.802314      18.5451   
FOR: False omission rate                 0.160536    0.0777436    0.0313396   

Classes                                          G  
Population                                   76716  
P: Condition positive                          671  
N: Condition negative                        76045  
Test outcome positive                            3  
Test outcome negative                        76713  
TP: True Positive                                0  
TN: True Negative                            76042  
FP: False Positive                               3  
FN: False Negative                             671  
TPR: (Sensitivity, hit rate, recall)             0  
TNR=SPC: (Specificity)                    0.999961  
PPV: Pos Pred Value (Precision)                  0  
NPV: Neg Pred Value                       0.991253  
FPR: False-out                         3.94503e-05  
FDR: False Discovery Rate                        1  
FNR: Miss Rate                                   1  
ACC: Accuracy                             0.991214  
F1 score                                         0  
MCC: Matthews correlation coefficient -0.000587425  
Informedness                          -3.94503e-05  
Markedness                             -0.00874689  
Prevalence                              0.00874655  
LR+: Positive likelihood ratio                   0  
LR-: Negative likelihood ratio             1.00004  
DOR: Diagnostic odds ratio                       0  
FOR: False omission rate                0.00874689  
Index(['installment', 'funded_amnt', 'term', 'loan_amnt', 'funded_amnt_inv'], dtype='object')
[-35.7243125   34.87986167 -18.0251852    1.45506546   1.4179102 ]



In [54]:

    
clf = OneVsRestClassifier(RandomForestClassifier(n_estimators=100))
predict_y = clf.fit(X_train_scaled, y_train_2).predict(X_test_scaled)
predict_y = lb.inverse_transform(predict_y)

print(accuracy_score(y_test, predict_y))
confusion_matrix = ConfusionMatrix(np.array(y_test, dtype='<U1'), predict_y)
confusion_matrix.plot()
print(confusion_matrix)









    



0.388171958913
Predicted      A      B     C    D    E    F   G  __all__
Actual                                                   
A          12253    531     4    0    0    0   0    12788
B          11097  11321   438    6    0    0   0    22862
C          14119   1160  4422   70    1    0   0    19772
D          10933     26   624  604   68    0   0    12255
E           4958      0    29  141  802   31   0     5961
F           1991      0     6   10   70  330   0     2407
G            541      0     0    3    8   72  47      671
__all__    55892  13038  5523  834  949  433  47    76716



In [55]:

    
confusion_matrix.print_stats()









    



Confusion Matrix:

Predicted      A      B     C    D    E    F   G  __all__
Actual                                                   
A          12253    531     4    0    0    0   0    12788
B          11097  11321   438    6    0    0   0    22862
C          14119   1160  4422   70    1    0   0    19772
D          10933     26   624  604   68    0   0    12255
E           4958      0    29  141  802   31   0     5961
F           1991      0     6   10   70  330   0     2407
G            541      0     0    3    8   72  47      671
__all__    55892  13038  5523  834  949  433  47    76716


Overall Statistics:

Accuracy: 0.388171958913
95% CI: (0.38472121084297828, 0.39163116596403957)
No Information Rate: ToDo
P-Value [Acc > NIR]: 1.0
Kappa: 0.241353230121
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                        A          B          C  \
Population                                 76716      76716      76716   
P: Condition positive                      12788      22862      19772   
N: Condition negative                      63928      53854      56944   
Test outcome positive                      55892      13038       5523   
Test outcome negative                      20824      63678      71193   
TP: True Positive                          12253      11321       4422   
TN: True Negative                          20289      52137      55843   
FP: False Positive                         43639       1717       1101   
FN: False Negative                           535      11541      15350   
TPR: (Sensitivity, hit rate, recall)    0.958164   0.495189    0.22365   
TNR=SPC: (Specificity)                  0.317373   0.968118   0.980665   
PPV: Pos Pred Value (Precision)         0.219226   0.868308   0.800652   
NPV: Neg Pred Value                     0.974308    0.81876   0.784389   
FPR: False-out                          0.682627  0.0318825  0.0193348   
FDR: False Discovery Rate               0.780774   0.131692   0.199348   
FNR: Miss Rate                         0.0418361   0.504811    0.77635   
ACC: Accuracy                           0.424188   0.827181    0.78556   
F1 score                                0.356814   0.630696   0.349634   
MCC: Matthews correlation coefficient   0.230924   0.564201   0.345735   
Informedness                            0.275537   0.463306   0.204315   
Markedness                              0.193535   0.687068   0.585041   
Prevalence                              0.166693   0.298008    0.25773   
LR+: Positive likelihood ratio           1.40364    15.5317    11.5672   
LR-: Negative likelihood ratio           0.13182   0.521436   0.791657   
DOR: Diagnostic odds ratio               10.6482    29.7863    14.6114   
FOR: False omission rate               0.0256915    0.18124   0.215611   

Classes                                         D           E          F  \
Population                                  76716       76716      76716   
P: Condition positive                       12255        5961       2407   
N: Condition negative                       64461       70755      74309   
Test outcome positive                         834         949        433   
Test outcome negative                       75882       75767      76283   
TP: True Positive                             604         802        330   
TN: True Negative                           64231       70608      74206   
FP: False Positive                            230         147        103   
FN: False Negative                          11651        5159       2077   
TPR: (Sensitivity, hit rate, recall)     0.049286    0.134541     0.1371   
TNR=SPC: (Specificity)                   0.996432    0.997922   0.998614   
PPV: Pos Pred Value (Precision)          0.724221      0.8451   0.762125   
NPV: Neg Pred Value                      0.846459     0.93191   0.972772   
FPR: False-out                         0.00356805  0.00207759  0.0013861   
FDR: False Discovery Rate                0.275779      0.1549   0.237875   
FNR: Miss Rate                           0.950714    0.865459     0.8629   
ACC: Accuracy                             0.84513    0.930836   0.971584   
F1 score                                0.0922912    0.232127   0.232394   
MCC: Matthews correlation coefficient    0.161525     0.32082    0.31581   
Informedness                             0.045718    0.132464   0.135714   
Markedness                                0.57068     0.77701   0.734897   
Prevalence                               0.159745   0.0777022  0.0313755   
LR+: Positive likelihood ratio            13.8132     64.7582    98.9104   
LR-: Negative likelihood ratio           0.954118    0.867261   0.864098   
DOR: Diagnostic odds ratio                14.4774     74.6699    114.467   
FOR: False omission rate                 0.153541   0.0680903  0.0272276   

Classes                                         G  
Population                                  76716  
P: Condition positive                         671  
N: Condition negative                       76045  
Test outcome positive                          47  
Test outcome negative                       76669  
TP: True Positive                              47  
TN: True Negative                           76045  
FP: False Positive                              0  
FN: False Negative                            624  
TPR: (Sensitivity, hit rate, recall)    0.0700447  
TNR=SPC: (Specificity)                          1  
PPV: Pos Pred Value (Precision)                 1  
NPV: Neg Pred Value                      0.991861  
FPR: False-out                                  0  
FDR: False Discovery Rate                       0  
FNR: Miss Rate                           0.929955  
ACC: Accuracy                            0.991866  
F1 score                                 0.130919  
MCC: Matthews correlation coefficient     0.26358  
Informedness                            0.0700447  
Markedness                               0.991861  
Prevalence                             0.00874655  
LR+: Positive likelihood ratio                inf  
LR-: Negative likelihood ratio           0.929955  
DOR: Diagnostic odds ratio                    inf  
FOR: False omission rate               0.00813888



In [56]:

    
features = []
for i,j in enumerate(grades):
    print('\n',j)
    feat_imp = clf.estimators_[i].feature_importances_
    positions = abs(feat_imp).argsort()[-5:][::-1]
    features.extend(list(X_train.columns[positions]))
    print(X_train.columns[positions])
    print(feat_imp[positions])









    



 A
Index(['revol_util', 'installment', 'revol_bal', 'funded_amnt_inv',
       'earliest_cr_line'],
      dtype='object')
[ 0.16382844  0.11410527  0.06305903  0.05875622  0.05705625]

 B
Index(['installment', 'revol_util', 'funded_amnt_inv', 'revol_bal', 'dti'], dtype='object')
[ 0.18082548  0.0815748   0.06421254  0.06384274  0.06121603]

 C
Index(['installment', 'revol_util', 'revol_bal', 'dti', 'earliest_cr_line'], dtype='object')
[ 0.15151168  0.08249654  0.07178606  0.07000384  0.06760838]

 D
Index(['installment', 'revol_util', 'revol_bal', 'dti', 'earliest_cr_line'], dtype='object')
[ 0.12177678  0.08687976  0.0728564   0.07252917  0.069689  ]

 E
Index(['installment', 'revol_util', 'funded_amnt_inv', 'dti', 'revol_bal'], dtype='object')
[ 0.16401471  0.07213717  0.06327645  0.06278267  0.06117716]

 F
Index(['installment', 'funded_amnt_inv', 'revol_util', 'funded_amnt',
       'loan_amnt'],
      dtype='object')
[ 0.19441472  0.06760932  0.06559835  0.06320743  0.06153764]

 G
Index(['installment', 'funded_amnt_inv', 'revol_util', 'dti', 'loan_amnt'], dtype='object')
[ 0.18354993  0.0720556   0.0674901   0.06233009  0.06232249]



In [57]:

    
pd.Series(features).value_counts()









    Out[57]:





installment         7
revol_util          7
revol_bal           5
dti                 5
funded_amnt_inv     5
earliest_cr_line    3
loan_amnt           2
funded_amnt         1
dtype: int64



In [ ]:



In [ ]:



In [ ]:

	term	int_rate	installment	grade	sub_grade	emp_length	home_ownership	annual_inc	purpose	zip_code	...	open_acc	pub_rec	revol_bal	revol_util	total_acc	acc_now_delinq	loan_amnt	dti	loan_status	days_since_first_credit_line
id
1077501	36 months	10.65	162.87	B	B2	10+ years	RENT	24.0	credit_card	860xx	...	3.0	0.0	13648.0	83.7	9.0	0.0	5000.0	27.65	Fully Paid	9830.0
1077430	60 months	15.27	59.83	C	C4	< 1 year	RENT	30.0	car	309xx	...	3.0	0.0	1687.0	9.4	4.0	0.0	2500.0	1.00	Charged Off	4627.0
1077175	36 months	15.96	84.33	C	C5	10+ years	RENT	13.0	small_business	606xx	...	2.0	0.0	2956.0	98.5	10.0	0.0	2400.0	8.72	Fully Paid	3682.0
1076863	36 months	13.49	339.31	C	C1	10+ years	RENT	50.0	other	917xx	...	10.0	0.0	5598.0	21.0	37.0	0.0	10000.0	20.00	Fully Paid	5782.0
1075269	36 months	7.90	156.46	A	A4	3 years	RENT	36.0	wedding	852xx	...	9.0	0.0	7963.0	28.3	12.0	0.0	5000.0	11.20	Fully Paid	2586.0

	term	int_rate	installment	grade	sub_grade	emp_length	home_ownership	annual_inc	purpose	zip_code	...	inq_last_6mths	open_acc	pub_rec	revol_bal	revol_util	total_acc	acc_now_delinq	loan_amnt	dti	days_since_first_credit_line
id
1077501	-0.534690	-0.706059	-1.042886	-0.595481	-0.676713	1.180548	-1.055764	-1.268913	0.288887	1.047140	...	0.137291	-1.627992	-0.329101	-0.081789	1.184680	-1.366628	-0.051363	-1.053594	1.426211	1.663403
1077430	1.870241	0.343152	-1.463942	0.152587	0.231467	-1.503256	-1.055764	-1.106368	-0.709206	-0.672477	...	3.886942	-1.627992	-0.329101	-0.732395	-1.807563	-1.791716	-0.051363	-1.361234	-1.998511	-0.368448
1077175	-0.534690	0.499852	-1.363827	0.152587	0.307149	1.180548	-1.055764	-1.566911	-2.455868	0.254431	...	1.074704	-1.832660	-0.329101	-0.663369	1.780712	-1.281611	-0.051363	-1.373539	-1.006434	-0.737485
1076863	-0.534690	-0.061090	-0.321892	0.152587	0.004422	1.180548	-1.055764	-0.564552	-2.705391	1.225031	...	0.137291	-0.195318	-0.329101	-0.519661	-1.340403	1.013862	-0.051363	-0.438315	0.443129	0.082597
1075269	-0.534690	-1.330590	-1.069079	-1.343549	-1.282167	-0.698114	-1.055764	-0.943823	-0.210160	1.022173	...	2.012117	-0.399986	-0.329101	-0.391019	-1.046414	-1.111576	-0.051363	-1.053594	-0.687736	-1.165490
1072053	-0.534690	1.108486	-1.261260	1.648722	1.518056	0.912168	-1.055764	-0.618734	-0.709206	1.171976	...	1.074704	-1.423325	-0.329101	-0.376985	1.337715	-1.791716	-0.051363	-1.299706	-1.439504	-1.474387
1071795	1.870241	1.708035	-1.085711	2.396790	2.350555	-0.429734	2.113283	-0.835460	-2.455868	1.352988	...	1.074704	0.009350	-0.329101	-0.540765	-0.873242	-1.026558	-0.051363	-0.979761	-1.413803	-1.081919
1071570	1.870241	-0.242771	-1.212142	-0.595481	-0.449668	-1.503256	-1.055764	-1.512730	-2.705391	0.778742	...	-0.800122	-1.832660	-0.329101	-0.319436	-0.716180	-1.876733	-0.051363	-1.007448	0.196395	-1.141668
1070078	1.870241	0.202349	-1.081379	0.152587	0.155785	-0.161354	2.113283	0.031445	0.538410	1.025294	...	1.074704	0.623353	-0.329101	-0.604842	-1.356511	-0.176383	-0.051363	-0.869010	-0.055479	-0.190763
1069908	-0.534690	-0.242771	-0.063512	-0.595481	-0.449668	1.180548	2.113283	0.112718	0.538410	1.212548	...	-0.800122	0.214017	-0.329101	0.445180	0.516157	0.758809	-0.051363	-0.192204	-0.741709	0.986250
1064687	-0.534690	-0.061090	-0.460541	0.152587	0.004422	-1.503256	-1.055764	-1.106368	0.538410	-0.872215	...	0.137291	-1.423325	-0.329101	-0.255632	1.506859	-1.366628	-0.051363	-0.561371	-0.831664	-1.081919
1069866	-0.534690	-0.874115	-1.313361	-0.595481	-0.752395	-0.698114	-1.055764	-1.512730	0.288887	0.254431	...	1.074704	0.009350	-0.329101	-0.425831	-0.450381	-1.196593	-0.051363	-1.299706	-0.512966	-0.974528
1069057	-0.534690	-0.706059	-0.377343	-0.595481	-0.676713	-0.698114	-1.055764	0.789987	-2.705391	1.331142	...	1.074704	0.623353	-0.329101	-0.171594	0.048997	0.333722	-0.051363	-0.438315	-1.219757	0.760922
1069759	-0.534690	0.574796	-1.564139	0.900654	0.761239	-1.503256	-1.055764	-1.160549	0.538410	0.363662	...	0.137291	0.009350	-0.329101	-0.469292	1.096081	-0.176383	-0.051363	-1.545817	0.482967	-1.569282
1065775	-0.534690	0.343152	-0.286463	0.152587	0.231467	-0.429734	-1.055764	-0.781278	-1.956821	1.237515	...	1.074704	0.623353	-0.329101	0.483636	0.641002	0.248704	-0.051363	-0.438315	0.263219	-0.297374
1069971	-0.534690	-1.755271	-1.260688	-1.343549	-1.509212	1.180548	0.528760	1.060895	-0.459683	-1.427736	...	-0.800122	1.851360	-0.329101	0.417983	-1.541765	1.438949	-0.051363	-1.225872	-0.775121	0.439528
1062474	-0.534690	-0.465331	-0.897453	-0.595481	-0.601032	-1.234875	0.528760	0.356535	-1.208252	1.140767	...	-0.800122	-1.423325	-0.329101	-0.824158	-0.666644	-0.941541	-0.051363	-0.930538	0.242658	-1.010455
1069742	-0.534690	-1.755271	-0.564212	-1.343549	-1.509212	0.107027	-1.055764	0.193990	0.538410	1.237515	...	-0.800122	-0.604654	-0.329101	-0.426321	-1.255830	0.248704	-0.051363	-0.536760	-0.859936	-0.618768
1069740	1.870241	0.343152	0.271935	0.152587	0.231467	-0.698114	-1.055764	-0.727097	0.538410	0.766259	...	2.012117	-0.604654	-0.329101	0.144762	1.261198	-0.261401	-0.051363	0.823006	1.282283	-0.166942
1039153	-0.534690	-0.304089	1.159080	-0.595481	-0.525350	1.180548	-1.055764	0.925441	0.538410	-0.591334	...	-0.800122	-0.809321	-0.329101	0.923792	1.450478	1.098879	-0.051363	0.915298	-0.428151	1.936764
1069710	-0.534690	-0.465331	-0.356830	-0.595481	-0.601032	1.180548	2.113283	-0.564552	0.288887	0.856765	...	-0.800122	-0.604654	-0.329101	-0.277172	1.132326	-0.346418	-0.051363	-0.438315	-0.690306	1.592720
1069700	-0.534690	-0.465331	-0.356830	-0.595481	-0.601032	-0.161354	-1.055764	-0.564552	0.538410	1.225031	...	-0.800122	-1.013989	-0.329101	0.144055	1.510887	-0.686488	-0.051363	-0.438315	-0.069615	-0.938991
1069559	-0.534690	-0.465331	-0.897453	-0.595481	-0.601032	-1.234875	-1.055764	0.139808	-0.459683	1.171976	...	0.137291	-0.809321	-0.329101	-0.499807	-0.990032	-1.536663	-0.051363	-0.930538	-1.818601	-0.677736
1069697	-0.534690	-0.874115	0.266827	-0.595481	-0.752395	-0.966495	0.528760	0.573261	0.288887	0.251310	...	-0.800122	-0.604654	-0.329101	-0.078580	1.595459	0.503757	-0.051363	0.176963	1.656239	-0.773412
1069800	-0.534690	0.116050	0.394566	0.152587	0.080104	0.912168	-1.055764	-0.293644	0.538410	-1.315383	...	0.137291	-0.809321	-0.329101	-0.504757	0.133569	-1.196593	-0.051363	0.176963	-0.171136	-1.010455
1069657	1.870241	0.683805	-1.203152	0.900654	0.836921	-0.966495	-1.055764	-0.537461	-2.705391	-1.168701	...	-0.800122	0.623353	-0.329101	-0.587816	0.210087	-0.261401	-0.051363	-1.053594	-0.331770	-1.010455
1069799	-0.534690	-0.465331	-1.167764	-0.595481	-0.601032	1.180548	0.528760	0.952532	0.538410	-0.619422	...	-0.800122	0.214017	-0.329101	-0.491811	-0.667853	1.608984	-0.051363	-1.176650	-1.403522	1.723152
1047704	-0.534690	-0.465331	-0.559553	-0.595481	-0.601032	-1.503256	-1.055764	-1.241822	0.288887	0.123353	...	-0.800122	-0.604654	-0.329101	-0.473806	0.193978	-1.111576	-0.051363	-0.622899	-0.560514	-1.450566
1032111	-0.534690	-1.419160	-1.152236	-1.343549	-1.357849	0.375407	0.528760	-1.431457	0.538410	-1.196789	...	-0.800122	-1.013989	-0.329101	-0.214402	1.313552	-1.111576	-0.051363	-1.130504	0.486822	1.247895
1069539	-0.534690	-1.330590	2.360832	-1.343549	-1.282167	-0.161354	0.528760	0.112718	0.538410	-1.387164	...	-0.800122	0.214017	-0.329101	0.435117	-1.082659	0.078669	-0.051363	2.247377	-0.324060	2.115229
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
37620521	-0.534690	-1.650804	1.296953	-1.343549	-1.433531	-1.503256	0.528760	0.112718	0.288887	0.766259	...	0.137291	-0.195318	-0.329101	-0.001884	0.439640	-0.686488	-0.051363	1.284465	0.611474	-0.273553
37670482	1.870241	2.741350	1.324740	3.144857	3.107372	-1.234875	-1.055764	0.112718	-1.707298	-1.387164	...	1.074704	3.488702	-0.329101	3.711918	-1.481356	4.499580	-0.051363	1.392139	2.276930	1.996903
37810537	-0.534690	2.323482	2.539487	2.396790	2.350555	0.643787	0.528760	2.686343	-2.455868	-1.399648	...	1.074704	-1.013989	4.256724	-0.708408	-1.235694	-1.366628	-0.051363	1.592105	-1.370110	-1.331849
37197690	1.870241	0.506666	0.675993	0.900654	0.836921	1.180548	0.528760	0.329444	-1.956821	1.037777	...	-0.800122	1.032689	1.963812	-0.145321	-0.696043	1.608984	-0.051363	1.284465	-0.730143	1.925830
37700380	1.870241	-0.310902	-0.332802	0.152587	0.004422	-0.698114	0.528760	-0.700006	0.288887	0.672632	...	0.137291	0.623353	-0.329101	-0.049098	0.866528	0.758809	-0.051363	0.176963	1.913254	0.618384
37650375	-0.534690	0.279563	1.124264	0.152587	0.307149	1.180548	2.113283	0.248172	-1.956821	-0.940875	...	0.137291	-0.195318	-0.329101	2.231260	0.584620	-0.601471	-0.051363	0.792242	0.912181	0.831996
37640409	1.870241	1.415074	-0.496133	1.648722	1.669419	-0.698114	-1.055764	0.248172	0.288887	1.390439	...	1.074704	0.418685	-0.329101	0.361522	0.226196	-0.091366	-0.051363	-0.290648	-0.865076	-1.509143
37650403	-0.534690	0.125134	1.097131	0.152587	0.231467	-0.429734	0.528760	3.499066	-1.956821	-1.281053	...	1.074704	0.214017	-0.329101	-0.067157	-1.384702	-0.941541	-0.051363	0.792242	-0.904913	-0.938600
37790415	-0.534690	0.279563	2.540631	0.152587	0.307149	1.180548	0.528760	0.356535	0.538410	1.234394	...	0.137291	0.828021	-0.329101	0.821042	0.641002	1.694002	-0.051363	2.022800	-0.001506	-0.534417
36118265	-0.534690	-0.401743	-0.079939	-0.595481	-0.449668	-1.234875	-1.055764	-0.158190	0.538410	1.193822	...	0.137291	0.623353	-0.329101	-0.117744	-0.389973	-0.006348	-0.051363	-0.192204	-1.035991	0.867533
36400285	-0.534690	0.767833	0.163729	0.900654	0.988284	0.107027	0.528760	-0.889641	0.538410	0.816193	...	-0.800122	-0.604654	-0.329101	-0.178066	0.661138	-0.091366	-0.051363	-0.090683	-0.966597	-0.950706
37357154	1.870241	1.869278	1.411534	1.648722	1.820783	-1.503256	-1.055764	-0.293644	0.288887	-1.281053	...	0.137291	0.623353	-0.329101	0.191432	0.278550	-0.431436	-0.051363	1.733619	-0.829094	-1.390426
37068125	-0.534690	0.279563	-0.688641	0.152587	0.307149	-0.161354	0.528760	0.085627	0.288887	0.741291	...	1.074704	-0.604654	-0.329101	-0.528690	-0.349700	-1.111576	-0.051363	-0.782871	0.732271	-0.819883
37317965	1.870241	-0.310902	0.584254	0.152587	0.004422	-1.503256	0.528760	3.499066	0.288887	0.298123	...	-0.800122	2.465363	-0.329101	2.789832	0.032888	3.054282	-0.051363	1.407521	-0.859936	0.689458
36801355	-0.534690	-0.174641	1.044908	0.152587	0.080104	1.180548	0.528760	0.166899	0.288887	1.287449	...	1.074704	-0.195318	-0.329101	-0.233004	0.218141	0.928844	-0.051363	0.792242	-0.478269	2.579162
37297854	-0.534690	0.279563	1.124264	0.152587	0.307149	1.180548	0.528760	0.789987	-2.455868	-0.656873	...	-0.800122	1.237356	4.256724	0.056480	-0.740343	0.078669	-0.051363	0.792242	0.923747	0.273169
37247819	1.870241	-0.022482	1.091819	0.152587	0.155785	1.180548	2.113283	-0.239463	0.538410	-0.747379	...	0.137291	-0.195318	-0.329101	0.569959	0.089269	-0.856523	-0.051363	1.982807	-0.728858	-0.403985
37127712	-0.534690	0.620217	-0.858551	0.900654	0.912602	-0.161354	2.113283	-0.970914	0.538410	-0.407201	...	0.137291	-0.195318	1.963812	-0.672072	-1.094741	-0.261401	-0.051363	-0.945920	0.305626	-0.986634
37167534	1.870241	2.550584	2.488653	2.396790	2.501918	0.643787	0.528760	0.735806	0.538410	-0.263639	...	0.137291	4.102705	-0.329101	0.622178	1.237034	3.309335	-0.051363	2.638079	-0.127444	0.570741
37087435	-0.534690	0.931346	0.060917	0.900654	1.063966	0.912168	-1.055764	-0.781278	0.538410	-0.772346	...	-0.800122	-0.809321	-0.329101	-0.753772	-0.812833	-0.346418	-0.051363	-0.192204	1.309269	-0.273553
37227443	-0.534690	0.931346	-1.306618	0.900654	1.063966	-1.503256	0.528760	-0.618734	0.538410	-0.557004	...	0.137291	-0.195318	1.963812	-0.717383	-0.824915	-0.601471	-0.051363	-1.333546	-0.239245	-0.309089
37077186	-0.534690	0.506666	0.302583	0.900654	0.836921	-0.161354	0.528760	-0.049827	-1.956821	-0.613180	...	-0.800122	-0.399986	-0.329101	-0.402333	0.395340	-0.856523	-0.051363	0.053908	-1.069403	-0.582450
37187152	-0.534690	-0.310902	0.338911	0.152587	0.004422	0.107027	0.528760	0.112718	0.538410	-0.688082	...	-0.800122	0.418685	-0.329101	0.905244	1.096081	-0.006348	-0.051363	0.176963	0.873629	0.094703
36808246	-0.534690	-0.742396	-0.911633	-0.595481	-0.601032	-0.429734	-1.055764	-0.781278	0.538410	-0.975205	...	0.137291	1.442024	-0.329101	-0.583791	-1.775345	2.119090	-0.051363	-0.930538	-0.757130	1.914114
37041208	1.870241	0.506666	-0.019462	0.900654	0.836921	1.180548	0.528760	-0.185281	0.538410	-1.106283	...	-0.800122	-0.604654	-0.329101	-0.222398	-0.011412	0.248704	-0.051363	0.423075	1.946666	0.011914
36743377	-0.534690	0.506666	-1.105121	0.900654	0.836921	1.180548	0.528760	-0.618734	-1.208252	0.891095	...	-0.800122	0.418685	-0.329101	-0.120137	0.367149	1.694002	-0.051363	-1.152039	2.618760	1.307644
36231718	-0.534690	-1.755271	-0.368313	-1.343549	-1.509212	-1.503256	-1.055764	-0.456189	0.538410	-0.622543	...	-0.800122	-0.399986	-0.329101	-0.238009	-1.147095	-0.346418	-0.051363	-0.342947	-0.428151	3.411350
36241316	-0.534690	0.620217	-0.807921	0.900654	0.912602	-0.966495	-1.055764	-1.187640	0.538410	-0.606939	...	0.137291	-1.627992	-0.329101	-0.728642	1.744467	-1.791716	-0.051363	-0.902851	0.260649	-1.616925
36421485	-0.534690	-1.155721	-1.191138	-0.595481	-0.752395	1.180548	0.528760	-0.564552	-0.709206	1.346746	...	-0.800122	0.009350	1.963812	-0.731688	-1.960598	0.418739	-0.051363	-1.176650	-0.503970	-0.416091
36260758	-0.534690	1.244747	-0.077815	1.648722	1.593738	-1.503256	2.113283	-1.052186	0.538410	-0.294848	...	0.137291	-0.399986	-0.329101	-0.444107	-0.510790	-0.431436	-0.051363	-0.333718	1.656239	-0.380163

	id	member_id	loan_amnt	funded_amnt	funded_amnt_inv	term	int_rate	installment	grade	sub_grade	...	il_util	open_rv_12m	open_rv_24m	max_bal_bc	all_util	total_rev_hi_lim	inq_fi	total_cu_tl	inq_last_12m	roi
11	1069908	1305008	12000.0	12000.0	12000.0	36 months	12.69	402.54	B	B5	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.161923
14	1069057	1303503	10000.0	10000.0	10000.0	36 months	10.65	325.74	B	B2	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-0.252801
17	1069971	1304884	3600.0	3600.0	3600.0	36 months	6.03	109.57	A	A1	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.051394
19	1069742	1304855	9200.0	9200.0	9200.0	36 months	6.03	280.01	A	A1	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.028257
24	1069559	1304634	6000.0	6000.0	6000.0	36 months	11.71	198.46	B	B3	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-0.658310
28	1069799	1304678	4000.0	4000.0	4000.0	36 months	11.71	132.31	B	B3	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.121198
31	1069539	1304608	31825.0	31825.0	31825.0	36 months	7.90	995.82	A	A4	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.096185
33	1069591	1304289	5000.0	5000.0	5000.0	36 months	8.90	158.77	A	A5	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.142918
36	1069361	1304255	10800.0	10800.0	10800.0	36 months	9.91	348.03	B	B1	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.125667
37	1069357	1304251	15000.0	15000.0	15000.0	36 months	7.90	469.36	A	A4	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.110840
40	1067573	1301955	9600.0	9600.0	9600.0	36 months	7.51	298.67	A	A3	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.119767
41	1069506	1304567	12000.0	12000.0	12000.0	36 months	7.90	375.49	A	A4	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.126365
44	1069469	1304526	6000.0	6000.0	6000.0	36 months	6.03	182.62	A	A1	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.010977
45	1051117	1282787	14000.0	14000.0	14000.0	36 months	9.91	451.15	B	B1	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.159994
46	1069465	1304521	5000.0	5000.0	5000.0	36 months	8.90	158.77	A	A5	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.004274
48	1069287	1304171	10000.0	10000.0	10000.0	36 months	6.03	304.36	A	A1	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.028267
49	1069453	1303701	11000.0	11000.0	11000.0	36 months	6.62	337.75	A	A2	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.105216
50	1069248	1304123	15000.0	15000.0	15000.0	36 months	9.91	483.38	B	B1	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.078518
51	1068120	1302485	25600.0	25600.0	25350.0	36 months	9.91	824.96	B	B1	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.159985
65	1069102	1303750	3500.0	3500.0	3500.0	36 months	10.65	114.01	B	B2	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.172334
74	1068893	1303514	14400.0	14400.0	14400.0	36 months	8.90	457.25	A	A5	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.143022
78	1068997	1303437	15000.0	15000.0	15000.0	36 months	7.90	469.36	A	A4	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.037176
83	1068967	1303403	4500.0	4500.0	4500.0	36 months	6.03	136.96	A	A1	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.095622
98	1068350	1302971	3500.0	3500.0	3500.0	36 months	6.03	106.53	A	A1	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.095617
102	1068508	1302906	6000.0	6000.0	6000.0	36 months	8.90	190.52	A	A5	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.142862
103	1066641	1300833	7200.0	7200.0	7200.0	36 months	9.91	232.02	B	B1	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.159839
104	1068315	1302930	9500.0	9500.0	9500.0	36 months	8.90	301.66	A	A5	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.104601
110	1068273	1302680	5500.0	5500.0	5500.0	36 months	6.62	168.88	A	A2	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.105069
111	1068274	1302681	11000.0	11000.0	11000.0	36 months	6.62	337.75	A	A2	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.105169
116	1061814	1293438	10000.0	10000.0	10000.0	36 months	8.90	317.54	A	A5	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.086892
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
886730	37691691	40464722	1500.0	1500.0	1500.0	36 months	8.67	47.47	B	B1	...	NaN	NaN	NaN	NaN	NaN	11200.0	NaN	NaN	NaN	0.027747
886739	37831603	40594612	8000.0	8000.0	8000.0	36 months	8.19	251.40	A	A5	...	NaN	NaN	NaN	NaN	NaN	21500.0	NaN	NaN	NaN	0.031838
886740	37107603	39870372	30000.0	30000.0	30000.0	36 months	6.99	926.18	A	A3	...	NaN	NaN	NaN	NaN	NaN	39300.0	NaN	NaN	NaN	0.048415
886812	37821486	40584462	26000.0	26000.0	26000.0	60 months	8.19	529.56	A	A5	...	NaN	NaN	NaN	NaN	NaN	93500.0	NaN	NaN	NaN	0.040067
886828	36231341	38942750	6000.0	6000.0	6000.0	36 months	8.67	189.88	B	B1	...	NaN	NaN	NaN	NaN	NaN	17000.0	NaN	NaN	NaN	0.022013
886888	37700422	40473182	35000.0	35000.0	35000.0	36 months	12.39	1169.04	C	C1	...	NaN	NaN	NaN	NaN	NaN	100100.0	NaN	NaN	NaN	0.074413
886890	37651307	40414240	20000.0	20000.0	20000.0	60 months	10.49	429.78	B	B3	...	NaN	NaN	NaN	NaN	NaN	89600.0	NaN	NaN	NaN	0.063475
886891	37701353	40474288	20950.0	20950.0	20950.0	60 months	17.14	522.24	D	D4	...	NaN	NaN	NaN	NaN	NaN	15000.0	NaN	NaN	NaN	0.133916
886928	37721317	40444274	30000.0	30000.0	30000.0	36 months	6.49	919.34	A	A2	...	NaN	NaN	NaN	NaN	NaN	105300.0	NaN	NaN	NaN	0.052816
886937	37611321	40374297	8000.0	8000.0	8000.0	36 months	9.49	256.23	B	B2	...	NaN	NaN	NaN	NaN	NaN	5300.0	NaN	NaN	NaN	-0.823315
886947	37691299	40464249	16000.0	16000.0	16000.0	36 months	9.49	512.46	B	B2	...	NaN	NaN	NaN	NaN	NaN	33800.0	NaN	NaN	NaN	0.020843
886953	37601328	40364250	17000.0	17000.0	17000.0	36 months	7.49	528.73	A	A4	...	NaN	NaN	NaN	NaN	NaN	66500.0	NaN	NaN	NaN	0.034161
886966	37751241	40514142	13200.0	13200.0	13200.0	36 months	15.59	461.41	D	D1	...	NaN	NaN	NaN	NaN	NaN	39600.0	NaN	NaN	NaN	0.025246
886980	37741211	40504131	6000.0	6000.0	6000.0	36 months	8.19	188.55	A	A5	...	NaN	NaN	NaN	NaN	NaN	31600.0	NaN	NaN	NaN	0.048925
887000	37731131	40494040	18000.0	18000.0	18000.0	36 months	6.99	555.71	A	A3	...	NaN	NaN	NaN	NaN	NaN	13500.0	NaN	NaN	NaN	-0.846413
887009	37701109	40474018	11000.0	11000.0	11000.0	36 months	11.44	362.43	B	B4	...	NaN	NaN	NaN	NaN	NaN	22500.0	NaN	NaN	NaN	0.082765
887012	37711157	40484084	10000.0	10000.0	10000.0	36 months	15.59	349.55	D	D1	...	NaN	NaN	NaN	NaN	NaN	5700.0	NaN	NaN	NaN	-0.826957
887032	35948698	38650263	35000.0	35000.0	35000.0	36 months	11.99	1162.34	B	B5	...	NaN	NaN	NaN	NaN	NaN	181700.0	NaN	NaN	NaN	0.066983
887093	37630866	40393702	4000.0	4000.0	4000.0	36 months	6.99	123.50	A	A3	...	NaN	NaN	NaN	NaN	NaN	76000.0	NaN	NaN	NaN	0.004657
887105	37720840	40443672	35000.0	35000.0	35000.0	36 months	13.66	1190.45	C	C3	...	NaN	NaN	NaN	NaN	NaN	47500.0	NaN	NaN	NaN	-0.732788
887121	37840758	40603578	30000.0	30000.0	30000.0	36 months	8.67	949.40	B	B1	...	NaN	NaN	NaN	NaN	NaN	39800.0	NaN	NaN	NaN	0.051029
887124	36281499	39002924	5000.0	5000.0	5000.0	36 months	10.49	162.49	B	B3	...	NaN	NaN	NaN	NaN	NaN	148400.0	NaN	NaN	NaN	0.055844
887128	37750780	40513580	1500.0	1500.0	1500.0	36 months	11.99	49.82	B	B5	...	NaN	NaN	NaN	NaN	NaN	15500.0	NaN	NaN	NaN	0.057553
887144	37760680	40523466	2000.0	2000.0	2000.0	36 months	8.19	62.85	A	A5	...	NaN	NaN	NaN	NaN	NaN	23900.0	NaN	NaN	NaN	0.031935
887164	37620521	40383278	24000.0	24000.0	24000.0	36 months	6.49	735.47	A	A2	...	NaN	NaN	NaN	NaN	NaN	23200.0	NaN	NaN	NaN	0.029352
887194	37650403	40413143	20000.0	20000.0	20000.0	36 months	14.31	686.57	C	C4	...	NaN	NaN	NaN	NaN	NaN	70000.0	NaN	NaN	NaN	0.071025
887263	37317965	40080791	25000.0	25000.0	25000.0	60 months	12.39	561.06	C	C1	...	NaN	NaN	NaN	NaN	NaN	120600.0	NaN	NaN	NaN	0.111017
887346	36808246	39560970	6000.0	6000.0	6000.0	36 months	10.49	194.99	B	B3	...	NaN	NaN	NaN	NaN	NaN	43400.0	NaN	NaN	NaN	0.077563
887364	36231718	38943165	10775.0	10775.0	10775.0	36 months	6.03	327.95	A	A1	...	NaN	NaN	NaN	NaN	NaN	41700.0	NaN	NaN	NaN	0.027552
887369	36421485	39142898	4000.0	4000.0	4000.0	36 months	8.67	126.59	B	B1	...	NaN	NaN	NaN	NaN	NaN	30100.0	NaN	NaN	NaN	0.039505

	term	int_rate	installment	grade	sub_grade	emp_length	home_ownership	annual_inc	purpose	zip_code	...	inq_last_6mths	open_acc	pub_rec	revol_bal	revol_util	total_acc	acc_now_delinq	loan_amnt	dti	days_since_first_credit_line
0	-0.534690	-0.401743	1.005720	-0.595481	-0.601032	-1.503256	-1.055764	0.437807	0.538410	1.171976	...	-0.800122	1.851360	-0.329101	0.556035	0.407422	1.183897	-0.051363	0.792242	1.453198	-0.058769
1	1.870241	0.865487	0.347860	0.900654	0.988284	-1.503256	0.528760	0.519080	0.538410	-0.778588	...	-0.800122	1.442024	-0.329101	1.029208	0.592675	1.353932	-0.051363	0.792242	0.796525	-0.262228
2	-0.534690	-0.092884	-0.352335	0.152587	0.080104	0.643787	-1.055764	0.681624	0.288887	-0.894062	...	3.886942	-0.809321	-0.329101	-0.174204	0.834310	-0.091366	-0.051363	-0.462926	0.648741	-0.071656
3	1.870241	0.554357	-0.710339	0.152587	0.231467	-1.503256	-1.055764	-0.781278	0.288887	-1.324746	...	0.137291	3.693369	-0.329101	-0.510631	-0.095984	1.013862	-0.051363	-0.438315	-0.115878	-1.152993
4	-0.534690	0.863216	-0.427319	0.900654	0.761239	-0.698114	-1.055764	-0.510370	0.538410	1.171976	...	0.137291	0.828021	-0.329101	-0.525372	-0.595362	0.588774	-0.051363	-0.595211	0.043471	-0.403985
5	1.870241	0.574796	0.291632	0.152587	0.231467	-1.503256	-1.055764	0.275262	0.538410	0.023484	...	0.137291	0.214017	-0.329101	-0.374646	-0.325537	2.204107	-0.051363	0.792242	0.057607	-0.368838
6	1.870241	0.910907	-0.056320	0.900654	0.761239	1.180548	-1.055764	-0.618734	0.538410	-1.418373	...	0.137291	-0.809321	1.963812	-0.152393	0.274523	-0.601471	-0.051363	0.300019	-1.060407	-0.047835
7	-0.534690	-1.103488	-1.059640	-1.343549	-1.206485	-0.698114	-1.055764	-0.700006	0.538410	-1.312263	...	-0.800122	-0.604654	-0.329101	-0.534129	-0.196665	-0.006348	-0.051363	-1.053594	-1.310997	-1.176814
8	-0.534690	0.463516	0.798665	0.152587	0.155785	1.180548	0.528760	-0.158190	0.288887	0.248189	...	0.137291	-0.195318	-0.329101	0.499356	1.261198	-0.261401	-0.051363	0.484603	-0.086321	-0.416872
9	-0.534690	-1.103488	1.535473	-1.343549	-1.206485	1.180548	0.528760	0.789987	0.538410	1.421648	...	-0.800122	0.418685	-0.329101	0.841766	-0.168474	0.673792	-0.051363	1.407521	-0.637618	0.451244
10	-0.534690	-1.394179	-0.912572	-1.343549	-1.357849	-0.966495	-1.055764	0.925441	-2.705391	1.299933	...	-0.800122	-0.604654	-0.329101	-0.586511	-1.199449	-1.366628	-0.051363	-0.899774	-1.083539	-1.509924
11	-0.534690	1.528625	3.642595	1.648722	1.593738	1.180548	0.528760	1.331803	0.538410	1.468462	...	1.074704	0.828021	-0.329101	1.557155	1.434369	1.013862	-0.051363	2.638079	0.228522	1.211968
12	-0.534690	0.420367	-1.079745	0.152587	0.307149	1.180548	0.528760	-0.672915	0.538410	-1.199910	...	-0.800122	-0.809321	-0.329101	-0.762584	0.854446	-0.771506	-0.051363	-1.127428	0.431564	-0.333301
13	-0.534690	0.352236	-1.053960	0.152587	0.080104	1.180548	-1.055764	-0.618734	-0.709206	1.196943	...	-0.800122	-1.832660	-0.329101	-0.574381	0.657111	-0.686488	-0.051363	-1.102816	-1.669533	-0.559019
14	-0.534690	-1.330590	-1.005169	-1.343549	-1.282167	-1.503256	-1.055764	-0.889641	0.288887	-0.572609	...	-0.800122	-1.013989	1.963812	-0.623281	-1.098768	-1.026558	-0.051363	-0.992066	1.254011	-0.570344
15	-0.534690	0.352236	3.271228	0.152587	0.080104	1.180548	0.528760	0.654533	0.538410	-0.909666	...	-0.800122	1.032689	-0.329101	0.529545	0.040942	1.864037	-0.051363	2.638079	0.634605	-0.356342
16	-0.534690	-0.158743	1.047646	0.152587	0.080104	-0.698114	-1.055764	0.248172	0.538410	-1.315383	...	0.137291	0.214017	-0.329101	-0.527221	-0.853106	-0.431436	-0.051363	0.792242	-0.691591	-1.188920
17	-0.534690	2.121361	-0.758027	1.648722	1.745101	0.375407	0.528760	-0.970914	-1.208252	-0.488344	...	-0.800122	0.418685	1.963812	-0.652382	-1.779372	-0.006348	-0.051363	-0.930538	0.234947	-0.071656
18	-0.534690	-1.423702	-0.818750	-1.343549	-1.282167	1.180548	-1.055764	-0.510370	0.538410	1.234394	...	-0.800122	0.214017	1.963812	-0.521129	-0.361782	-0.601471	-0.051363	-0.807483	0.301771	0.605497
19	1.870241	-0.147388	0.321176	-0.595481	-0.525350	1.180548	0.528760	-0.158190	0.538410	-0.113836	...	-0.800122	0.009350	-0.329101	1.109711	0.568511	0.078669	-0.051363	1.010666	1.360672	-0.403985
20	-0.534690	-0.594779	0.168347	-0.595481	-0.676713	1.180548	0.528760	-0.239463	0.288887	-0.544520	...	-0.800122	-0.604654	-0.329101	0.835184	0.878609	-0.261401	-0.051363	0.053908	-0.369038	0.070882
21	-0.534690	-0.628845	1.234350	-0.595481	-0.676713	-0.698114	-1.055764	-0.185281	0.538410	-1.574418	...	-0.800122	0.214017	-0.329101	0.767899	-0.152365	-0.006348	-0.051363	1.038354	0.497102	0.724995
22	1.870241	0.960870	1.403893	1.648722	1.518056	0.375407	-1.055764	-0.347826	0.538410	0.703841	...	-0.800122	-1.013989	-0.329101	-0.303825	-1.259858	-1.111576	-0.051363	2.022800	-0.271372	-0.559410
23	-0.534690	-0.372219	-0.338605	-0.595481	-0.601032	1.180548	2.113283	-1.052186	0.288887	-1.196789	...	-0.800122	0.009350	-0.329101	0.475205	0.858473	-0.431436	-0.051363	-0.429086	1.636963	0.832387
24	-0.534690	-0.372219	0.303809	-0.595481	-0.601032	0.107027	0.528760	-0.564552	0.538410	1.427890	...	-0.800122	0.009350	-0.329101	-0.135476	-0.486627	-0.686488	-0.051363	0.152352	0.001064	-1.188530
25	-0.534690	0.574796	-0.864558	0.900654	0.836921	-0.698114	-1.055764	-0.943823	0.538410	-0.606939	...	2.012117	3.898037	1.963812	-0.447643	-1.074604	2.714212	-0.051363	-0.948997	-0.730143	0.498887
26	-0.534690	0.129676	-0.726194	0.152587	0.004422	-0.698114	-1.055764	-0.483280	0.538410	1.203185	...	-0.800122	-0.195318	-0.329101	-0.381989	0.496021	-1.026558	-0.051363	-0.807483	-0.049054	-1.117847
27	1.870241	1.051710	-1.143369	1.648722	1.593738	-0.966495	-1.055764	-1.404366	-2.705391	0.351179	...	0.137291	-0.809321	-0.329101	-0.724998	-0.305401	-0.771506	-0.051363	-1.004372	-0.933185	-1.189311
28	-0.534690	-0.712872	-0.803671	-0.595481	-0.601032	-1.503256	2.113283	-0.700006	0.538410	-1.271691	...	2.012117	1.237356	-0.329101	-0.803815	-1.936435	-0.346418	-0.051363	-0.832094	-1.280155	-1.272491
29	-0.534690	-0.515294	-0.900027	-0.595481	-0.525350	-0.429734	-1.055764	-0.022736	0.538410	-1.321625	...	0.137291	0.214017	1.963812	-0.633779	-0.671880	0.078669	-0.051363	-0.930538	-1.836592	0.558635
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
75802	-0.534690	-1.469122	-0.796030	-1.343549	-1.282167	-0.698114	0.528760	-0.185281	0.288887	-0.538279	...	0.137291	0.828021	-0.329101	-0.011675	-1.465247	-0.091366	-0.051363	-0.782871	1.618972	-0.510595
75803	-0.534690	0.129676	-0.656032	0.152587	0.004422	1.180548	2.113283	0.383626	-2.705391	-1.284174	...	-0.800122	-0.399986	-0.329101	0.254855	1.007481	-0.006348	-0.051363	-0.745955	1.062535	1.854365
75804	-0.534690	-0.501668	-0.359936	-0.595481	-0.601032	0.643787	0.528760	-0.374916	0.288887	1.278087	...	-0.800122	0.009350	-0.329101	-0.134823	-0.216801	-1.026558	-0.051363	-0.438315	-0.989728	-1.271710
75805	1.870241	1.296981	0.432815	0.900654	1.063966	0.375407	0.528760	2.821797	0.538410	-0.135682	...	-0.800122	0.828021	-0.329101	2.006014	1.531023	0.928844	-0.051363	0.792242	0.282495	1.794616
75806	-0.534690	-0.288192	0.478541	-0.595481	-0.449668	0.643787	-1.055764	2.009073	0.288887	1.249999	...	0.137291	1.032689	-0.329101	0.289287	0.854446	0.928844	-0.051363	0.300019	0.084594	0.807784
75807	-0.534690	0.463516	0.010736	0.152587	0.155785	-0.429734	2.113283	-0.835460	-2.705391	-1.100041	...	2.012117	-1.423325	-0.329101	-0.613490	0.870555	-1.111576	-0.051363	-0.192204	-1.594998	-0.962812
75808	-0.534690	-0.685620	0.957174	-0.595481	-0.676713	-1.234875	-1.055764	-0.347826	0.538410	-1.324746	...	-0.800122	-0.195318	-0.329101	0.467481	-0.442327	-0.856523	-0.051363	0.792242	-0.228964	1.176040
75809	1.870241	1.233392	0.702227	1.648722	1.669419	-0.698114	0.528760	2.632161	0.538410	0.766259	...	1.074704	-0.399986	-0.329101	-0.381663	-0.547035	0.418739	-0.051363	1.118340	-0.507826	2.210125
75810	-0.534690	-0.022482	-1.082973	0.152587	0.155785	1.180548	2.113283	-1.052186	0.538410	-0.191858	...	1.074704	0.214017	4.256724	0.411020	1.128298	0.588774	-0.051363	-1.115122	0.799095	0.713280
75811	-0.534690	-0.501668	-0.090237	-0.595481	-0.601032	-0.966495	-1.055764	-0.564552	0.538410	-1.031381	...	0.137291	-0.399986	-0.329101	-0.626925	-1.529683	0.588774	-0.051363	-0.192204	1.293848	-0.439522
75812	1.870241	0.563441	-0.534668	0.152587	0.307149	-1.234875	-1.055764	-0.700006	0.538410	1.431011	...	-0.800122	-0.809321	-0.329101	0.427230	0.508103	-0.771506	-0.051363	-0.222968	-0.127444	-0.355951
75813	-0.534690	-0.147388	-0.880985	-0.595481	-0.525350	0.107027	2.113283	-0.239463	0.288887	-0.606939	...	0.137291	-0.399986	-0.329101	-0.510359	0.681274	-0.346418	-0.051363	-0.930538	-0.943466	0.082988
75814	1.870241	0.166012	-0.554936	0.152587	0.231467	0.107027	-1.055764	-0.781278	0.288887	1.440374	...	-0.800122	0.009350	-0.329101	-0.290172	0.636975	-1.111576	-0.051363	-0.192204	0.579347	-0.523482
75815	-0.534690	-0.853676	0.928692	-0.595481	-0.676713	-0.698114	-1.055764	1.873619	0.538410	1.299933	...	-0.800122	-1.013989	-0.329101	-0.747354	-1.980734	-0.006348	-0.051363	0.792242	-1.452355	-0.463734
75816	-0.534690	0.052461	0.051069	0.152587	0.231467	1.180548	0.528760	-0.293644	0.538410	1.321779	...	0.137291	0.623353	4.256724	-0.370621	0.310768	1.353932	-0.051363	-0.118370	0.137282	1.176040
75817	-0.534690	1.133467	-0.812784	0.900654	0.912602	1.180548	0.528760	0.681624	0.538410	1.212548	...	0.137291	0.418685	1.963812	-0.328792	0.592675	2.374142	-0.051363	-0.930538	1.558574	-0.297374
75818	1.870241	0.554357	0.287709	0.152587	0.231467	0.912168	0.528760	2.144527	0.538410	1.212548	...	1.074704	1.032689	-0.329101	2.494527	0.057051	0.673792	-0.051363	0.792242	-0.136439	2.995451
75819	1.870241	-0.855947	0.424765	-0.595481	-0.752395	-0.966495	0.528760	0.112718	0.538410	1.171976	...	-0.800122	0.214017	-0.329101	0.462368	-0.325537	0.163687	-0.051363	1.355222	-0.538667	3.066525
75820	-0.534690	0.865487	-1.267839	0.900654	0.988284	-0.698114	2.113283	-0.320735	0.538410	1.312417	...	-0.800122	1.237356	-0.329101	0.103042	0.238277	-0.346418	-0.051363	-1.299706	-0.380603	-1.510315
75821	-0.534690	-0.401743	-1.029891	-0.595481	-0.601032	1.180548	0.528760	-0.158190	0.538410	-1.362197	...	-0.800122	-1.218657	-0.329101	-0.639110	0.669193	1.438949	-0.051363	-1.053594	-1.199195	0.440309
75822	-0.534690	0.733768	-0.543045	0.900654	0.912602	-1.234875	-1.055764	-0.943823	-1.208252	1.274966	...	-0.800122	4.307373	-0.329101	-0.605549	0.467830	1.183897	-0.051363	-0.684427	1.004707	-0.820665
75823	-0.534690	-0.147388	-0.715570	0.152587	0.080104	0.107027	-1.055764	-1.106368	0.538410	1.281208	...	-0.800122	-1.423325	1.963812	-0.461949	1.043726	-1.281611	-0.051363	-0.782871	-0.883067	-0.498489
75824	-0.534690	-0.819610	-1.245895	-0.595481	-0.676713	-0.966495	-1.055764	-0.158190	0.538410	-0.981447	...	0.137291	-1.218657	-0.329101	-0.559695	-1.296103	-0.771506	-0.051363	-1.238178	-0.560514	0.867533
75825	-0.534690	-1.330590	1.871738	-1.343549	-1.282167	-0.161354	0.528760	0.519080	-2.455868	0.248189	...	2.012117	0.009350	-0.329101	-0.505518	-1.747154	-0.091366	-0.051363	1.776688	-1.071973	0.070882
75826	-0.534690	-1.553150	1.820291	-1.343549	-1.282167	1.180548	0.528760	3.499066	-1.956821	-1.424615	...	0.137291	1.442024	-0.329101	0.618370	-1.259858	0.673792	-0.051363	1.776688	-1.561586	-0.131796
75827	-0.534690	-1.755271	1.400828	-1.343549	-1.509212	1.180548	0.528760	0.735806	0.288887	-1.199910	...	0.137291	2.465363	-0.329101	0.773991	-1.396784	1.098879	-0.051363	1.407521	0.871059	0.807784
75828	-0.534690	-0.288192	0.341853	-0.595481	-0.449668	0.643787	0.528760	1.602711	0.538410	-0.688082	...	-0.800122	-0.195318	-0.329101	3.247666	0.512130	-0.856523	-0.051363	0.176963	0.075598	-0.439912
75829	1.870241	1.867007	1.247304	1.648722	1.745101	1.180548	-1.055764	1.087986	0.538410	-1.009535	...	0.137291	1.032689	1.963812	-0.445141	-0.236937	1.949055	-0.051363	1.555188	-0.241815	-0.284878
75830	-0.534690	1.017645	-0.077937	0.900654	1.063966	0.912168	-1.055764	-0.970914	-2.705391	-0.606939	...	0.137291	-0.399986	1.963812	-0.446446	-1.070577	-0.601471	-0.051363	-0.315260	0.826081	-0.570344
75831	-0.534690	-0.769648	-0.647900	-0.595481	-0.601032	-0.698114	2.113283	-1.268913	0.538410	1.434132	...	-0.800122	-0.399986	1.963812	-0.141459	0.173841	-1.026558	-0.051363	-0.684427	-0.225109	-1.213132