Probability Calibration using ML-Insights

On Example of Mortality Model Using MIMIC ICU Data*

This workbook is intended to demonstrate why probability calibration may be useful, and how to do it using the prob_calibration_function in the ML-Insights package.

We build a random forest model, demonstrate that using the vote percentage as a probability is not well-calibrated, and then show how to use an independent validation set and the prob_calibration_function to properly calibrate it so that accurate probabilities can be obtained.

*MIMIC-III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG. Scientific Data (2016). https://mimic.physionet.org



In [1]:

    
# "pip install ml_insights" in terminal if needed

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ml_insights as mli
%matplotlib inline

from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split, StratifiedKFold
from sklearn.metrics import roc_auc_score, log_loss, brier_score_loss
from sklearn import clone

In the next few cells, we load in some data, inspect it, select columns for our features and outcome (mortality) and fill in missing values with the median of that column.



In [2]:

    
# Load dataset derived from the MMIC database

lab_aug_df = pd.read_csv("data/lab_vital_icu_table.csv")



In [3]:

    
lab_aug_df.head(10)









    Out[3]:






  
    
      
      subject_id
      hadm_id
      icustay_id
      aniongap_min
      aniongap_max
      albumin_min
      albumin_max
      bicarbonate_min
      bicarbonate_max
      bilirubin_min
      ...
      meanbp_mean
      resprate_min
      resprate_max
      resprate_mean
      tempc_min
      tempc_max
      tempc_mean
      spo2_min
      spo2_max
      spo2_mean
    
  
  
    
      0
      9
      150750
      220597
      13.0
      13.0
      NaN
      NaN
      26.0
      30.0
      0.4
      ...
      98.850000
      14.0
      19.0
      14.369565
      35.500001
      37.888887
      37.049383
      95.0
      100.0
      97.650000
    
    
      1
      13
      143045
      263738
      10.0
      14.0
      3.9
      3.9
      23.0
      24.0
      0.4
      ...
      93.772727
      11.0
      25.0
      15.320000
      35.944443
      37.400002
      36.653534
      94.0
      100.0
      97.700000
    
    
      2
      20
      157681
      264490
      12.0
      12.0
      NaN
      NaN
      21.0
      21.0
      NaN
      ...
      75.058333
      10.0
      27.0
      15.404762
      35.900002
      37.299999
      36.545714
      95.0
      100.0
      98.435897
    
    
      3
      28
      162569
      225559
      13.0
      13.0
      NaN
      NaN
      23.0
      23.0
      NaN
      ...
      69.133333
      9.0
      32.0
      16.677419
      35.900002
      37.700001
      37.033333
      92.0
      100.0
      96.419355
    
    
      4
      37
      188670
      213503
      9.0
      10.0
      NaN
      NaN
      33.0
      35.0
      NaN
      ...
      73.297610
      15.0
      30.0
      22.241379
      36.833335
      38.055556
      37.333334
      89.0
      99.0
      96.533333
    
    
      5
      71
      111944
      211832
      13.0
      30.0
      3.6
      4.7
      17.0
      26.0
      0.4
      ...
      79.222208
      13.0
      25.0
      17.130435
      35.722224
      37.833332
      37.351852
      99.0
      100.0
      99.862069
    
    
      6
      72
      156857
      239612
      18.0
      18.0
      NaN
      NaN
      20.0
      20.0
      3.5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      7
      78
      100536
      233150
      9.0
      9.0
      2.7
      3.1
      26.0
      26.0
      0.8
      ...
      121.129705
      11.0
      24.0
      16.764706
      36.333334
      36.833335
      36.577778
      96.0
      100.0
      98.470588
    
    
      8
      88
      123010
      297289
      13.0
      18.0
      NaN
      NaN
      19.0
      26.0
      NaN
      ...
      91.884615
      9.0
      45.0
      20.352941
      35.722224
      39.111112
      37.810185
      99.0
      100.0
      99.962963
    
    
      9
      95
      160891
      216431
      13.0
      17.0
      NaN
      NaN
      23.0
      26.0
      NaN
      ...
      93.952386
      14.0
      20.0
      16.363636
      35.722224
      36.666667
      36.305556
      96.0
      100.0
      98.071429
    
  

10 rows × 79 columns



In [4]:

    
X = lab_aug_df.loc[:,['aniongap_min', 'aniongap_max',
       'albumin_min', 'albumin_max', 'bicarbonate_min', 'bicarbonate_max',
       'bilirubin_min', 'bilirubin_max', 'creatinine_min', 'creatinine_max',
       'chloride_min', 'chloride_max', 
       'hematocrit_min', 'hematocrit_max', 'hemoglobin_min', 'hemoglobin_max',
       'lactate_min', 'lactate_max', 'platelet_min', 'platelet_max',
       'potassium_min', 'potassium_max', 'ptt_min', 'ptt_max', 'inr_min',
       'inr_max', 'pt_min', 'pt_max', 'sodium_min', 'sodium_max', 'bun_min',
       'bun_max', 'wbc_min', 'wbc_max','sysbp_max', 'sysbp_mean', 'diasbp_min', 'diasbp_max', 'diasbp_mean',
       'meanbp_min', 'meanbp_max', 'meanbp_mean', 'resprate_min',
       'resprate_max', 'resprate_mean', 'tempc_min', 'tempc_max', 'tempc_mean',
       'spo2_min', 'spo2_max', 'spo2_mean']]
y = lab_aug_df['hospital_expire_flag']



In [5]:

    
# Impute the median for in each column to replace NA's 

median_vec = [X.iloc[:,i].median() for i in range(len(X.columns))]

for i in range(len(X.columns)):
    X.iloc[:,i].fillna(median_vec[i],inplace=True)

Now we divide the data into training, validation, and test sets. The training set will be used to fit the model, the validation set will be used to calibrate the probabilities, and the test set will be used to evaluate the performance. We use a 60-20-20 split (achived by first doing 80/20 and then splitting the 80 by 75/25)



In [6]:

    
X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=0.2, random_state=942)



In [7]:

    
X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.25, random_state=942)

Next, we fit a Random Forest model to our training data. Then we use that model to predict "probabilities" on our validation and test sets.

I use quotes on "probabilities" because these numbers, which are the percentage of trees that voted "yes" are better understood as mere scores. A higher value should generally indicate a higher probability of mortality. However, there is no reason to expect these to be well-calibrated probabilities. The fact that, say, 60% of the trees voted "yes" on a particular case does not mean that that case has a 60% probability of mortality.

We will demonstrate this empirically later.



In [8]:

    
rfmodel1 = RandomForestClassifier(n_estimators = 500, class_weight='balanced_subsample', random_state=942, n_jobs=-1 )
rfmodel1.fit(X_train,y_train)









    Out[8]:





RandomForestClassifier(bootstrap=True, class_weight='balanced_subsample',
            criterion='gini', max_depth=None, max_features='auto',
            max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=-1,
            oob_score=False, random_state=942, verbose=0, warm_start=False)



In [9]:

    
val_res = rfmodel1.predict_proba(X_val)[:,1]
test_res_uncalib = rfmodel1.predict_proba(X_test)[:,1]

What follows is histograms showing on the validation set (top row) and the test set (bottom row) the histogram of the random forest vote percentage for the negative cases (survivors, left side) and the positive cases (mortalities, right side). As we would expect the distribution of mortalities is shifted to the right. We store the "count vectors" so that we can calculate some empirical probabilites. We use 20 equal size bins.



In [10]:

    
# Side by side histograms showing scores of positive vs negative cases

fig, axis = plt.subplots(2,2, figsize = (8,8))

ax=axis.flatten()
countvec0_val = ax[0].hist(val_res[np.where(y_val==0)],bins=20,range=[0,1]);
countvec1_val = ax[1].hist(val_res[np.where(y_val==1)],bins=20,range=[0,1]);
countvec0_test = ax[2].hist(test_res_uncalib[np.where(y_test==0)],bins=20,range=[0,1]);
countvec1_test = ax[3].hist(test_res_uncalib[np.where(y_test==1)],bins=20,range=[0,1]);

Next, we see the numerical counts in each bin for the validation set.

For example, in the first entry, we see that for the entries that had a vote percentage between 0 and .05, there were 4968 survivors, and 62 mortalities. For those with a vote percentage between .05 and .10, there were 2450 survivors and 138 mortalities, and so on. This allows us to calculate some empirical probabilities and visually see how well (or poorly) calibrated the vote percentage is.



In [11]:

    
list(zip(countvec0_val[0],countvec1_val[0]))









    Out[11]:





[(4968.0, 62.0),
 (2450.0, 138.0),
 (1543.0, 204.0),
 (722.0, 193.0),
 (403.0, 185.0),
 (223.0, 104.0),
 (130.0, 98.0),
 (57.0, 70.0),
 (44.0, 58.0),
 (19.0, 59.0),
 (14.0, 33.0),
 (12.0, 44.0),
 (6.0, 20.0),
 (1.0, 22.0),
 (3.0, 14.0),
 (0.0, 12.0),
 (2.0, 13.0),
 (0.0, 12.0),
 (0.0, 5.0),
 (0.0, 2.0)]

Using the counts above, we can calculate empirical probabilities for each bin. Below, we can see the endpoints of each bin and the empirical probability of mortality for entries in that bin. So for example, for entries with a vote percentage between .45 and .5, we can see that the empirical probability of mortality was .756 (59/78), which suggests that the vote percentage is not a well-calibrated probability.



In [12]:

    
emp_prob_vec_val = countvec1_val[0]/(countvec0_val[0]+countvec1_val[0])
list(zip([(i/20, (i+1)/20) for i in range(20)],emp_prob_vec_val))









    Out[12]:





[((0.0, 0.05), 0.012326043737574552),
 ((0.05, 0.1), 0.053323029366306028),
 ((0.1, 0.15), 0.11677160847166572),
 ((0.15, 0.2), 0.21092896174863388),
 ((0.2, 0.25), 0.31462585034013607),
 ((0.25, 0.3), 0.31804281345565749),
 ((0.3, 0.35), 0.42982456140350878),
 ((0.35, 0.4), 0.55118110236220474),
 ((0.4, 0.45), 0.56862745098039214),
 ((0.45, 0.5), 0.75641025641025639),
 ((0.5, 0.55), 0.7021276595744681),
 ((0.55, 0.6), 0.7857142857142857),
 ((0.6, 0.65), 0.76923076923076927),
 ((0.65, 0.7), 0.95652173913043481),
 ((0.7, 0.75), 0.82352941176470584),
 ((0.75, 0.8), 1.0),
 ((0.8, 0.85), 0.8666666666666667),
 ((0.85, 0.9), 1.0),
 ((0.9, 0.95), 1.0),
 ((0.95, 1.0), 1.0)]

To demonstrate this more visually, a well-calibrated probability should have the points falling (roughly) on the line x=y. However, as we see in the plot below, most of the points lie above that line. This suggest that (in particular for entries with vote percentage >.4) the true probability is considerably higher than the vote percentage.



In [13]:

    
plt.plot(np.linspace(0,1,101),np.linspace(0,1,101),'k')
plt.plot(np.linspace(0.05,1,20)-.025,emp_prob_vec_val,'ro')









    Out[13]:





[<matplotlib.lines.Line2D at 0x120991940>]

So how do we fix this?! We can use the validation set to create a calibration function. This function would map our uncalibrated score into a proper probability. The ml-insights package has a function called prob_calibration_function which does just that. Give it the scores and the true 0/1 values, and it will return a calibration function. This calibration function can then be applied to the model output to yield a well-calibrated probability.



In [14]:

    
calibrate_rf = mli.prob_calibration_function(y_val,val_res)









    



Originally there were 496 knots.  Reducing to 200 while preserving first and last knot.
Trying 43 values of C between 0.0001 and 10000000000.0
Best value found C = [  2.15443469e+09]

The function takes a little bit of time to run, primarily because it is doing cross-validation on the regularization parameter C (to balance fitting the current data with generalizing well). There are additional settings on the function that can be applied to speed up this process, if desired.

As you can see below, the resulting function is a nice cubic function that fits the data well without over-fitting



In [15]:

    
plt.plot(np.linspace(0,1,101),calibrate_rf(np.linspace(0,1,101)))
plt.plot(np.linspace(0,1,101),np.linspace(0,1,101),'k')
plt.plot(np.linspace(0.05,1,20)-.025,emp_prob_vec_val,'ro')









    Out[15]:





[<matplotlib.lines.Line2D at 0x1181f4a90>]

Now let's demonstrate that we actually have more accurate probabilities after calibration than we did before. We will use the log_loss metric to determine which performs better on the test set, with or without calibration.



In [16]:

    
log_loss(y_test,test_res_uncalib)









    Out[16]:





0.25252730333409118



In [17]:

    
test_res_calib=calibrate_rf(test_res_uncalib)



In [18]:

    
log_loss(y_test,test_res_calib)









    Out[18]:





0.24446136650832703



In [19]:

    
brier_score_loss(y_test,test_res_uncalib)









    Out[19]:





0.073175005512208458



In [20]:

    
brier_score_loss(y_test,test_res_calib)









    Out[20]:





0.071984829479211476

As you can see, there is a significant improvement after calibration. (Lower loss is better)

This next plot shows the empirical probabilities of the test set data (that was not seen by the calibration process).

You can see, that the curve fits the data pretty well, though not as well as it fits the validation set (since it was trained on the validation set).

Note, the warning below is just eliminating the points where the counts in both bins was 0)



In [ ]:



In [21]:

    
emp_prob_vec_test = countvec1_test[0]/(countvec0_test[0]+countvec1_test[0])
plt.plot(np.linspace(0,1,101),calibrate_rf(np.linspace(0,1,101)))
#plt.plot(np.linspace(.025,.975,20),emp_prob_vec_val,'ro')
plt.plot(np.linspace(.025,.975,20),emp_prob_vec_test,'y+')
plt.plot(np.linspace(0,1,101),np.linspace(0,1,101),'k')









    



/Applications/anaconda/envs/py3env/lib/python3.5/site-packages/ipykernel/__main__.py:1: RuntimeWarning: invalid value encountered in true_divide
  if __name__ == '__main__':






    Out[21]:





[<matplotlib.lines.Line2D at 0x1183ab2b0>]



In [22]:

    
rfm_calib = mli.SplineCalibratedClassifierCV(rfmodel1, cv='prefit')
rfm_calib.fit(X_val,y_val)









    



Determining Calibration Function
Originally there were 496 knots.  Reducing to 200 while preserving first and last knot.
Trying 43 values of C between 0.0001 and 10000000000.0
Best value found C = [  2.15443469e+09]






    Out[22]:





SplineCalibratedClassifierCV(base_estimator=RandomForestClassifier(bootstrap=True, class_weight='balanced_subsample',
            criterion='gini', max_depth=None, max_features='auto',
            max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=-1,
            oob_score=False, random_state=942, verbose=0, warm_start=False),
               cv='prefit', method='logistic')



In [23]:

    
test_res_calib_alt = rfm_calib.predict_proba(X_test)[:,1]



In [24]:

    
plt.hist(test_res_calib_alt - test_res_calib)









    Out[24]:





(array([     0.,      0.,      0.,      0.,      0.,  11946.,      0.,
             0.,      0.,      0.]),
 array([-0.5, -0.4, -0.3, -0.2, -0.1,  0. ,  0.1,  0.2,  0.3,  0.4,  0.5]),
 <a list of 10 Patch objects>)



In [ ]:



In [ ]:

The default calibration uses a logistic regression fitting procedure, which is best if your metric of interest is log-loss. If brier score is the metric of interest, it is better to use method='ridge' since that will use an L2 (mean-squared error) loss function as the calibration metric to optimize.



In [25]:

    
calibrate_rf_L2 = mli.prob_calibration_function(y_val,val_res,method='ridge')









    



Originally there were 496 knots.  Reducing to 200 while preserving first and last knot.
Trying 43 values of alpha between 1e-07 and 10000000.0
Best value found alpha = 0.001



In [26]:

    
test_res_calib_L2=calibrate_rf_L2(test_res_uncalib)



In [27]:

    
log_loss(y_test,test_res_calib_L2)









    Out[27]:





0.24851897496515921



In [28]:

    
brier_score_loss(y_test,test_res_calib_L2)









    Out[28]:





0.071981483302756707



In [ ]:



In [ ]:



In [ ]:



In [ ]:

As you can see, using the L2 calibration improves (slightly) on the brier score loss, while doing (significantly) worse on the log-loss. In general, using the default logistic method is advised for calibration of probabilities



In [29]:

    
rfm_calib = mli.SplineCalibratedClassifierCV(rfmodel1, cv='prefit')
rfm_calib.fit(X_val,y_val)









    



Determining Calibration Function
Originally there were 496 knots.  Reducing to 200 while preserving first and last knot.
Trying 43 values of C between 0.0001 and 10000000000.0
Best value found C = [  2.15443469e+09]






    Out[29]:





SplineCalibratedClassifierCV(base_estimator=RandomForestClassifier(bootstrap=True, class_weight='balanced_subsample',
            criterion='gini', max_depth=None, max_features='auto',
            max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=-1,
            oob_score=False, random_state=942, verbose=0, warm_start=False),
               cv='prefit', method='logistic')



In [30]:

    
test_res_calib_direct = rfm_calib.predict_proba(X_test)[:,1]



In [31]:

    
log_loss(y_test,test_res_calib_direct)









    Out[31]:





0.24446136650832703



In [32]:

    
brier_score_loss(y_test,test_res_calib_direct)









    Out[32]:





0.071984829479211476

Changing the Regularization Search

At the heart of the spline-fitting component of the calibration is a Penalized Logistic Regression, driven by the LogisticRegressionCV function in sklearn. This function searches over a range of "C" values, where C is a parameter that controls the regularization (shrinkage) of the coefficients.

By default, prob_calibration_function searches over 43 values, logarithmically spaced between 10^-4 and 10^10. (In numpy language this is reg_param_vec = 10**np.linspace(-4,10,43). However, you have the option to specify your own reg_param_vec vector of C values to try in the Cross-Validation.

If you recall previously, when we ran

calibrate_rf = mli.prob_calibration_function(y_val, val_res)

We got a message:

Trying 43 values of C between 0.0001 and 10000000000.0 Best value found C = [ 1000.]

If we wish to optimize more fully, we might want to try a range of points between 100 and 10000 for C. Below we show how to do this.



In [33]:

    
new_C_vec = np.linspace(100,10000,101)
new_C_vec









    Out[33]:





array([   100.,    199.,    298.,    397.,    496.,    595.,    694.,
          793.,    892.,    991.,   1090.,   1189.,   1288.,   1387.,
         1486.,   1585.,   1684.,   1783.,   1882.,   1981.,   2080.,
         2179.,   2278.,   2377.,   2476.,   2575.,   2674.,   2773.,
         2872.,   2971.,   3070.,   3169.,   3268.,   3367.,   3466.,
         3565.,   3664.,   3763.,   3862.,   3961.,   4060.,   4159.,
         4258.,   4357.,   4456.,   4555.,   4654.,   4753.,   4852.,
         4951.,   5050.,   5149.,   5248.,   5347.,   5446.,   5545.,
         5644.,   5743.,   5842.,   5941.,   6040.,   6139.,   6238.,
         6337.,   6436.,   6535.,   6634.,   6733.,   6832.,   6931.,
         7030.,   7129.,   7228.,   7327.,   7426.,   7525.,   7624.,
         7723.,   7822.,   7921.,   8020.,   8119.,   8218.,   8317.,
         8416.,   8515.,   8614.,   8713.,   8812.,   8911.,   9010.,
         9109.,   9208.,   9307.,   9406.,   9505.,   9604.,   9703.,
         9802.,   9901.,  10000.])



In [34]:

    
calibrate_rf_new = mli.prob_calibration_function(y_val, val_res, reg_param_vec=new_C_vec)









    



Originally there were 496 knots.  Reducing to 200 while preserving first and last knot.
Trying 101 values of C between 100.0 and 10000.0
Best value found C = [ 4456.]

Note that the run-time of the calibration depends on the number of C values that are tried. In general, if you want to save some time, you can specify a range with fewer points in the reg_param_vec. For a more thorough search (which takes more time) you can specify more points.



In [35]:

    
test_res_calib_new=calibrate_rf_new(test_res_uncalib)



In [36]:

    
log_loss(y_test,test_res_calib_new)









    Out[36]:





0.24468146139296634

We see that, in this case, the log_loss was slightly improved from our previous calibration.



In [ ]:



In [ ]:



In [ ]:

Existing Sklearn Calibration Functionality

Note, sklearn has a CalibratedClassifierCV function, but it does not seem to work as well. Here we will use it in the same fashion as we did above. That is, we assume we have a model that is fit already and then use the additional data to calibrate (this is why we use the option cv='prefit'). Unlike ml-insights, the sklearn functionality creates a new model which integrates the calibration rather than keeping the model and calibration separate.



In [37]:

    
from sklearn.calibration import CalibratedClassifierCV



In [38]:

    
clf_isotonic = CalibratedClassifierCV(rfmodel1, method='isotonic',cv="prefit")
clf_isotonic.fit(X_val, y_val)
prob_pos_isotonic = clf_isotonic.predict_proba(X_test)[:, 1]



In [39]:

    
log_loss(y_test,prob_pos_isotonic)









    Out[39]:





0.26663934423877206



In [40]:

    
brier_score_loss(y_test,prob_pos_isotonic)









    Out[40]:





0.074458906233875938

The log loss and brier score loss is actually worse (higher) than the uncalibrated version, and considerably worse than the calibrated method using ml-insights



In [41]:

    
plt.plot(np.linspace(0,1,101),calibrate_rf(np.linspace(0,1,101)))
plt.plot(np.linspace(.025,.975,20),emp_prob_vec_val,'ro')
plt.plot(np.linspace(0,1,101),np.linspace(0,1,101),'k')
plt.plot(test_res_uncalib,prob_pos_isotonic,'c.')
plt.plot(np.linspace(.025,.975,20),emp_prob_vec_test,'g+')









    Out[41]:





[<matplotlib.lines.Line2D at 0x12c1cac50>]

From the plot above, it seems to give more of a "step function" calibration rather than a smooth, spline-like curve. It also seems to be off quite a bit when it predicts a 0.2 probability and the actual (empirical) probability is around 0.05.



In [42]:

    
clf_sigmoid = CalibratedClassifierCV(rfmodel1, method='sigmoid',cv="prefit")
clf_sigmoid.fit(X_val, y_val)
prob_pos_sigmoid = clf_sigmoid.predict_proba(X_test)[:, 1]



In [43]:

    
log_loss(y_test,prob_pos_sigmoid)









    Out[43]:





0.25396992513972455



In [44]:

    
brier_score_loss(y_test,prob_pos_sigmoid)









    Out[44]:





0.073394487887474241

Again, the results are not very reassuring.



In [45]:

    
#plt.plot(np.linspace(.025,.975,20),emp_prob_vec_val,'ro')
plt.plot(np.linspace(0,1,101),np.linspace(0,1,101),'k')
plt.plot(test_res_uncalib,prob_pos_sigmoid,'c.')
plt.plot(np.linspace(.025,.975,20),emp_prob_vec_test,'g+')
plt.plot(np.linspace(0,1,101),calibrate_rf(np.linspace(0,1,101)))









    Out[45]:





[<matplotlib.lines.Line2D at 0x129bd2fd0>]

Above, we see that the sigmoid option assumes a rather strict parametric form, and thus fits badly if the reality is not of that shape.



In [ ]:

Cross Validated Calibration

One disadvantage to the above approach is that the model uses one set of data for training and another set for calibration. As a result, neither step benefits from the full size of available data. Below we outline the ML-Insights capabilities to do a cross-validated training and calibration, and compare it to the existing sklearn functionality.



In [ ]:



In [ ]:



In [46]:

    
rfm = RandomForestClassifier(n_estimators = 500, class_weight='balanced_subsample', random_state=942, n_jobs=-1 )
rfm_cv = mli.SplineCalibratedClassifierCV(rfm)
rfm_cv.fit(X_train_val,y_train_val)









    



training fold 1 of 5
training fold 2 of 5
training fold 3 of 5
training fold 4 of 5
training fold 5 of 5
Training Full Model
Determining Calibration Function
Originally there were 959 knots.  Reducing to 200 while preserving first and last knot.
Trying 43 values of C between 0.0001 and 10000000000.0
Best value found C = [  1.00000000e+10]






    Out[46]:





SplineCalibratedClassifierCV(base_estimator=RandomForestClassifier(bootstrap=True, class_weight='balanced_subsample',
            criterion='gini', max_depth=None, max_features='auto',
            max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=-1,
            oob_score=False, random_state=942, verbose=0, warm_start=False),
               cv=5, method='logistic')



In [47]:

    
test_res_uncalib_cv = rfm_cv.uncalibrated_classifier.predict_proba(X_test)[:,1]



In [48]:

    
#test_res_calib_cv = calibration_function(test_res_uncalib_cv)
test_res_calib_cv = rfm_cv.predict_proba(X_test)[:,1]



In [49]:

    
#log_loss(y_test,test_res_uncalib_cv)



In [50]:

    
#brier_score_loss(y_test,test_res_uncalib_cv)



In [51]:

    
log_loss(y_test,test_res_calib_cv)









    Out[51]:





0.24330933810106203



In [52]:

    
brier_score_loss(y_test,test_res_calib_cv)









    Out[52]:





0.071831705602529008



In [53]:

    
plt.plot(np.linspace(0,1,101),np.linspace(0,1,101),'k')
plt.plot(np.linspace(.025,.975,20),emp_prob_vec_test,'g+')
plt.plot(np.linspace(0,1,101),calibrate_rf(np.linspace(0,1,101)))
plt.plot(np.linspace(0,1,101),rfm_cv.calib_func(np.linspace(0,1,101)))









    Out[53]:





[<matplotlib.lines.Line2D at 0x13161fb00>]



In [ ]:



In [54]:

    
rfm = RandomForestClassifier(n_estimators = 500, class_weight='balanced_subsample', random_state=942)
rfm_cv_L2 = mli.SplineCalibratedClassifierCV(rfm, method='ridge')
rfm_cv_L2.fit(X_train_val,y_train_val)









    



training fold 1 of 5
training fold 2 of 5
training fold 3 of 5
training fold 4 of 5
training fold 5 of 5
Training Full Model
Determining Calibration Function
Originally there were 977 knots.  Reducing to 200 while preserving first and last knot.
Trying 43 values of alpha between 1e-07 and 10000000.0
Best value found alpha = 0.004641588833612773






    Out[54]:





SplineCalibratedClassifierCV(base_estimator=RandomForestClassifier(bootstrap=True, class_weight='balanced_subsample',
            criterion='gini', max_depth=None, max_features='auto',
            max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=1,
            oob_score=False, random_state=942, verbose=0, warm_start=False),
               cv=5, method='ridge')



In [55]:

    
test_res_calib_cv_L2 = rfm_cv_L2.predict_proba(X_test)[:,1]



In [56]:

    
log_loss(y_test,test_res_calib_cv_L2)









    Out[56]:





0.24740431124716544



In [57]:

    
brier_score_loss(y_test,test_res_calib_cv_L2)









    Out[57]:





0.071814391744741871



In [58]:

    
plt.plot(np.linspace(0,1,101),np.linspace(0,1,101),'k')
plt.plot(np.linspace(.025,.975,20),emp_prob_vec_test,'g+')
plt.plot(np.linspace(0,1,101),rfm_cv.calib_func(np.linspace(0,1,101)),'g')
plt.plot(np.linspace(0,1,101),rfm_cv_L2.calib_func(np.linspace(0,1,101)),'r')









    Out[58]:





[<matplotlib.lines.Line2D at 0x12b9c7f60>]



In [59]:

    
clf_isotonic_xval = CalibratedClassifierCV(rfm, method='isotonic', cv=5)
clf_isotonic_xval.fit(X_train_val,y_train_val)
prob_pos_isotonic_xval = clf_isotonic_xval.predict_proba(X_test)[:, 1]



In [60]:

    
log_loss(y_test,prob_pos_isotonic_xval)









    Out[60]:





0.24809833644158982



In [61]:

    
brier_score_loss(y_test,prob_pos_isotonic_xval)









    Out[61]:





0.072170301163490103



In [62]:

    
plt.plot(np.linspace(0,1,101),np.linspace(0,1,101),'k')
plt.plot(test_res_uncalib_cv,prob_pos_isotonic_xval,'c.')
plt.plot(np.linspace(.025,.975,20),emp_prob_vec_test,'g+')
#plt.plot(np.linspace(0,1,101),calibration_function(np.linspace(0,1,101)),'g')









    Out[62]:





[<matplotlib.lines.Line2D at 0x131c17f28>]



In [63]:

    
clf_sigmoid_xval = CalibratedClassifierCV(rfm, method='sigmoid', cv=5)
clf_sigmoid_xval.fit(X_train_val,y_train_val)
prob_pos_sigmoid_xval = clf_sigmoid_xval.predict_proba(X_test)[:, 1]



In [64]:

    
plt.plot(np.linspace(0,1,101),np.linspace(0,1,101),'k')
plt.plot(test_res_uncalib_cv,prob_pos_sigmoid,'c.')
plt.plot(np.linspace(.025,.975,20),emp_prob_vec_test,'g+')
#plt.plot(np.linspace(0,1,101),calibrate_rf(np.linspace(0,1,101)))
#plt.plot(np.linspace(0,1,101),calibration_function(np.linspace(0,1,101)),'g')









    Out[64]:





[<matplotlib.lines.Line2D at 0x131a62ef0>]



In [65]:

    
log_loss(y_test,prob_pos_sigmoid_xval)









    Out[65]:





0.25331592414895782



In [66]:

    
brier_score_loss(y_test,prob_pos_sigmoid_xval)









    Out[66]:





0.073210966238481959



In [ ]:



In [ ]:

	subject_id	hadm_id	icustay_id	aniongap_min	aniongap_max	albumin_min	albumin_max	bicarbonate_min	bicarbonate_max	bilirubin_min	...	meanbp_mean	resprate_min	resprate_max	resprate_mean	tempc_min	tempc_max	tempc_mean	spo2_min	spo2_max	spo2_mean
0	9	150750	220597	13.0	13.0	NaN	NaN	26.0	30.0	0.4	...	98.850000	14.0	19.0	14.369565	35.500001	37.888887	37.049383	95.0	100.0	97.650000
1	13	143045	263738	10.0	14.0	3.9	3.9	23.0	24.0	0.4	...	93.772727	11.0	25.0	15.320000	35.944443	37.400002	36.653534	94.0	100.0	97.700000
2	20	157681	264490	12.0	12.0	NaN	NaN	21.0	21.0	NaN	...	75.058333	10.0	27.0	15.404762	35.900002	37.299999	36.545714	95.0	100.0	98.435897
3	28	162569	225559	13.0	13.0	NaN	NaN	23.0	23.0	NaN	...	69.133333	9.0	32.0	16.677419	35.900002	37.700001	37.033333	92.0	100.0	96.419355
4	37	188670	213503	9.0	10.0	NaN	NaN	33.0	35.0	NaN	...	73.297610	15.0	30.0	22.241379	36.833335	38.055556	37.333334	89.0	99.0	96.533333
5	71	111944	211832	13.0	30.0	3.6	4.7	17.0	26.0	0.4	...	79.222208	13.0	25.0	17.130435	35.722224	37.833332	37.351852	99.0	100.0	99.862069
6	72	156857	239612	18.0	18.0	NaN	NaN	20.0	20.0	3.5	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	78	100536	233150	9.0	9.0	2.7	3.1	26.0	26.0	0.8	...	121.129705	11.0	24.0	16.764706	36.333334	36.833335	36.577778	96.0	100.0	98.470588
8	88	123010	297289	13.0	18.0	NaN	NaN	19.0	26.0	NaN	...	91.884615	9.0	45.0	20.352941	35.722224	39.111112	37.810185	99.0	100.0	99.962963
9	95	160891	216431	13.0	17.0	NaN	NaN	23.0	26.0	NaN	...	93.952386	14.0	20.0	16.363636	35.722224	36.666667	36.305556	96.0	100.0	98.071429