License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Local Feature Importance and Reason Codes using LOCO

Based on: Lei, Jing, G’Sell, Max, Rinaldo, Alessandro, Tibshirani, Ryan J., and Wasserman, Larry. Distribution-free predictive inference for regression. Journal of the American Statistical Association, 2017.

http://www.stat.cmu.edu/~ryantibs/papers/conformal.pdf

Instead of dropping one variable and retraining a model to understand the importance of that variable in a model, these examples set a variable to missing and rescore this new, corrupted sample with the original model. This is approach may be more appropriate for nonlineaer models in which nonlinear dependencies can allow variables to nearly completely replace one another when a model is retrained.

Preliminaries: imports, start h2o, load and clean data



In [1]:

    
# imports
import h2o 
import numpy as np
import pandas as pd
from h2o.estimators.gbm import H2OGradientBoostingEstimator



In [2]:

    
# start h2o
h2o.init()
h2o.remove_all()
h2o.show_progress()









    



Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: java version "1.8.0_112"; Java(TM) SE Runtime Environment (build 1.8.0_112-b16); Java HotSpot(TM) 64-Bit Server VM (build 25.112-b16, mixed mode)
  Starting server from /Users/phall/anaconda/lib/python3.5/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/tc/0ss1l73113j3wdyjsxmy1j2r0000gn/T/tmpoidcpxm3
  JVM stdout: /var/folders/tc/0ss1l73113j3wdyjsxmy1j2r0000gn/T/tmpoidcpxm3/h2o_phall_started_from_python.out
  JVM stderr: /var/folders/tc/0ss1l73113j3wdyjsxmy1j2r0000gn/T/tmpoidcpxm3/h2o_phall_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321... successful.






    




H2O cluster uptime:
01 secs
H2O cluster version:
3.12.0.1
H2O cluster version age:
2 months and 7 days 
H2O cluster name:
H2O_from_python_phall_tff9kq
H2O cluster total nodes:
1
H2O cluster free memory:
3.556 Gb
H2O cluster total cores:
8
H2O cluster allowed cores:
8
H2O cluster status:
accepting new members, healthy
H2O connection url:
http://127.0.0.1:54321
H2O connection proxy:
None
H2O internal security:
False
Python version:
3.5.2 final

Load and prepare data for modeling



In [3]:

    
# load clean data
path = '../../03_regression/data/train.csv'
frame = h2o.import_file(path=path)









    



Parse progress: |█████████████████████████████████████████████████████████| 100%



In [4]:

    
# assign target and inputs
y = 'SalePrice'
X = [name for name in frame.columns if name not in [y, 'Id']]

LOCO is simpler to use with data containing no missing values



In [5]:

    
# determine column types
# impute
reals, enums = [], []
for key, val in frame.types.items():
    if key in X:
        if val == 'enum':
            enums.append(key)
        else: 
            reals.append(key)
            
_ = frame[reals].impute(method='median')
_ = frame[enums].impute(method='mode')



In [6]:

    
# split into training and validation
train, valid = frame.split_frame([0.7])

Understanding linear correlation and nonlinear dependencies are important for LOCO.

If strong relationships are present, retraining the model after removing an input will simply allow the linearly correlated or nonlinearly dependent variables to make up for the impact of the removed input. This why we will set to missing here, and not drop and retrain.
If such relationships are present, models must be regularized to prevent correlation or other dependencies from creating instability in model parameters or rules. (H2O GBM is regularized by column and row sampling.)
For H2O GBM, setting a variable to missing causes it to follow the majority path in each decision tree. The interpretation of LOCO becomes the numeric difference between the local behavior of the variable and the most common local behavior.
Because of linear correlation and nonlinear dependence, LOCO values are valid only for a given data and feature set.



In [7]:

    
# print out linearly correlated pairs
corr = train[reals].cor().as_data_frame()
for i in range(0, corr.shape[0]):
    for j in range(0, corr.shape[1]):
        if i != j:
            if np.abs(corr.iat[i, j]) > 0.7:
                print(corr.columns[i], corr.columns[j])









    



GarageYrBlt YearBuilt
GrLivArea TotRmsAbvGrd
TotalBsmtSF 1stFlrSF
1stFlrSF TotalBsmtSF
GarageCars GarageArea
YearBuilt GarageYrBlt
GarageArea GarageCars
TotRmsAbvGrd GrLivArea

It's likely that even more nonlinearly dependent relationships exist between inputs. Nonlinearly relationships can also behave differently at global and local scales.

Removing one var from each correlated pair may increase stability in the model and its explanations



In [8]:

    
X_reals_decorr = [i for i in reals if i not in  ['GarageYrBlt', 'TotRmsAbvGrd', 'TotalBsmtSF', 'GarageCars']]

Train a predictive model



In [9]:

    
# train GBM model
model = H2OGradientBoostingEstimator(ntrees=100,
                                     max_depth=10,
                                     distribution='huber',
                                     learn_rate=0.1,
                                     stopping_rounds=5,
                                     seed=12345)

model.train(y=y, x=X_reals_decorr, training_frame=train, validation_frame=valid)

preds = valid['Id'].cbind(model.predict(valid))









    



gbm Model Build progress: |███████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%

Rescore predictive model

Each time leaving one input (covariate) out by setting it to missing
To generate local feature importance values for each decision



In [10]:

    
h2o.no_progress()

for k, i in enumerate(X_reals_decorr):

    # train and predict with Xi set to missing
    valid_loco = h2o.deep_copy(valid, 'valid_loco')
    valid_loco[i] = np.nan
    preds_loco = model.predict(valid_loco)
    
    # create a new, named column for the LOCO prediction
    preds_loco.columns = [i]
    preds = preds.cbind(preds_loco)
    
    # subtract the LOCO prediction from 
    preds[i] = preds[i] - preds['predict']
    
    print('LOCO Progress: ' + i + ' (' + str(k+1) + '/' + str(len(X_reals_decorr)) + ') ...')
    
print('Done.')  

preds.head()









    



LOCO Progress: HalfBath (1/32) ...
LOCO Progress: BsmtFinSF1 (2/32) ...
LOCO Progress: MoSold (3/32) ...
LOCO Progress: PoolArea (4/32) ...
LOCO Progress: BsmtHalfBath (5/32) ...
LOCO Progress: BedroomAbvGr (6/32) ...
LOCO Progress: BsmtFinSF2 (7/32) ...
LOCO Progress: GrLivArea (8/32) ...
LOCO Progress: KitchenAbvGr (9/32) ...
LOCO Progress: LotFrontage (10/32) ...
LOCO Progress: MSSubClass (11/32) ...
LOCO Progress: BsmtUnfSF (12/32) ...
LOCO Progress: LotArea (13/32) ...
LOCO Progress: OpenPorchSF (14/32) ...
LOCO Progress: 1stFlrSF (15/32) ...
LOCO Progress: 3SsnPorch (16/32) ...
LOCO Progress: Fireplaces (17/32) ...
LOCO Progress: EnclosedPorch (18/32) ...
LOCO Progress: LowQualFinSF (19/32) ...
LOCO Progress: 2ndFlrSF (20/32) ...
LOCO Progress: YearBuilt (21/32) ...
LOCO Progress: YrSold (22/32) ...
LOCO Progress: BsmtFullBath (23/32) ...
LOCO Progress: WoodDeckSF (24/32) ...
LOCO Progress: OverallCond (25/32) ...
LOCO Progress: GarageArea (26/32) ...
LOCO Progress: ScreenPorch (27/32) ...
LOCO Progress: MasVnrArea (28/32) ...
LOCO Progress: MiscVal (29/32) ...
LOCO Progress: OverallQual (30/32) ...
LOCO Progress: YearRemodAdd (31/32) ...
LOCO Progress: FullBath (32/32) ...
Done.






    






  Id   predict   HalfBath   BsmtFinSF1    MoSold   PoolArea   BsmtHalfBath   BedroomAbvGr   BsmtFinSF2   GrLivArea   KitchenAbvGr   LotFrontage   MSSubClass   BsmtUnfSF     LotArea   OpenPorchSF   1stFlrSF   3SsnPorch   Fireplaces   EnclosedPorch   LowQualFinSF   2ndFlrSF   YearBuilt    YrSold   BsmtFullBath   WoodDeckSF   OverallCond   GarageArea   ScreenPorch   MasVnrArea   MiscVal   OverallQual   YearRemodAdd   FullBath


   4    162737      0        5426.69  1307.74          0         0              0               0    2825.75           0         719.187     836.729    1734.61      0           727.522    683.734           0   -294.17          -79.2358              0   -810.712   12167.3  -5918.04    -1459.14       -635.263      5173.29    -8044.2            0         0            0     -36467.5      21993.6       0    
   5    321949   -708.329    12408.5  27137.8           0         0              0               0   -7560.51           0       -3064.35       0        -321.035 -11176.7       -6596.96  13893.5            0     -3.02205          0                  0  -3952.2    -7019.57  1599.46    -2264.6       -3086.45         0      -39288.8            0       609.803         0     -80951.9       4742.61      0    
   7    288940   4286.86   -38286.6      0             0         0              0               0     521.262           0           0       -716.772     275.934    229.391      -912.311 -10971              0    224.838           0                  0      0        664.773  -297.844       -1.47215     -112.298         0       -9753.45           0      -139.076         0     -83533.4       -411.94      0    
   8    214558  -2934.78    -2155.88 -1596.35          0         0              0               0  -34924.2            0         824.924       0       -3677.11   5894.01      -2393.04   -717.704           0   1019.98         2054.01               0  -1302.45    1471.64     0        -525.598     -6039.15     -1901.71    -6247.74           0      1615.11         0     -14857.3      25196.1       0    
  11    130570      0       -1501.59   373.448          0         0              0               0    -772.792           0         -58.044     420.323    1796.17     24.6083         0     -7408.69           0      0               0                  0      0        706.53     0       -1356.89          0         5717.06     2281.01           0         0            0       1182.2       2090.64      0    
  13    132042      0       -3415.88  -313.206          0         0            150.151            0    -733.679           0         447.161     180.826   -1629.84  -3467.46          0         0              0      0               0                  0      0        220.939     0       -1251           262.766      -927.486     -469.974       -2031.6        0            0       1709.63       6448.98      0    
  18    124058    140.167     7117.68  1132.37          0         0           1359.23            0     315.657        1959.07       -58.044    4142.5      5807.16    265.833         0    -11407.5            0    922.09            0                  0      0     -17613.7   1225.79        0             0          538.854      561.559           0         0            0       8583.6        896.597  -1646.95 
  21    315667    105.762    12636.7  25091.7           0         0              0               0  -30379.8            0       -2426.85       0       13761.8   -6041.51      -5266.2    9589.59           0    433.176           0                  0  -6544.59   -8300.07     0        2814.84       1723.94         0      -18974.7            0     -2872.53         0    -107036        -2016.31      0    
  38    139593    140.167   -11190.9   3093.13          0       -60.0012          0               0   -1225.85           0         840.328      75.9954    3984.55   3999.82          0      4484.02           0  -3524.89            0                  0      0      10936.9    240.289        0             0         -445.479     1442.54           0      -822.334         0       2498.6       2362.87    162.936
  41    158223   -557.072      538.874  3711.56          0         0              0               0   -2371.64           0       -3263.31    1773.37      538.58  -1101.54      -2713.6  -18880.1            0  -8196.03            0                  0      0      -3437.71  -496.612     2685.87          0          955.21        0              0     -1095.96         0      -9173.71       -579.85  -2707.86 








    Out[10]:

The numeric values in each column are an estimate of how much each variable contributed to each decision. These values can tell you how a variable and it's values were weighted in any given decision by the model. These values are crucially important for machine learning interpretability and are often to referred to "local feature importance", "reason codes", or "turn-down codes." The latter phrases are borrowed from credit scoring. Credit lenders must provide reasons for turning down a credit application, even for automated decisions. Reason codes can be easily extracted from LOCO local feature importance values, by simply ranking the variables that played the largest role in any given decision.

Helper function for finding quantile indices



In [11]:

    
def get_quantile_dict(y, id_, frame):

    """ Returns the percentiles of a column y as the indices for another column id_.
    
    Args:
        y: Column in which to find percentiles.
        id_: Id column that stores indices for percentiles of y.
        frame: H2OFrame containing y and id_. 
    
    Returns:
        Dictionary of percentile values and index column values.
    
    """
    
    quantiles_df = frame.as_data_frame()
    quantiles_df.sort_values(y, inplace=True)
    quantiles_df.reset_index(inplace=True)
    
    percentiles_dict = {}
    percentiles_dict[0] = quantiles_df.loc[0, id_]
    percentiles_dict[99] = quantiles_df.loc[quantiles_df.shape[0]-1, id_]
    inc = quantiles_df.shape[0]//10
    
    for i in range(1, 10):
        percentiles_dict[i * 10] = quantiles_df.loc[i * inc,  id_]

    return percentiles_dict

quantile_dict = get_quantile_dict('predict', 'Id', preds)
print(quantile_dict)









    



{0: 621.0, 80: 606.0, 50: 744.0, 99: 1299.0, 20: 372.0, 70: 849.0, 40: 207.0, 10: 996.0, 60: 983.0, 90: 1343.0, 30: 38.0}

Plot some reason codes for a representative row



In [12]:

    
%matplotlib inline



In [13]:

    
median_loco = preds[preds['Id'] == int(quantile_dict[50]), :].as_data_frame().drop(['Id', 'predict'], axis=1)
median_loco = median_loco.T.sort_values(by=0)[:5]
_ = median_loco.plot(kind='bar', 
                     title='Negative Reason Codes for the Median of Predicted Sale Price\n', 
                     legend=False)



In [14]:

    
median_loco = preds[preds['Id'] == int(quantile_dict[50]), :].as_data_frame().drop(['Id', 'predict'], axis=1)
median_loco = median_loco.T.sort_values(by=0, ascending=False)[:5]
_ = median_loco.plot(kind='bar', 
                     title='Positive Reason Codes for the Median of Predicted Sale Price\n', 
                     color='r',
                     legend=False)

Ensembling explantions to reduce local variance

Explanations derived from high variance machine learning models can be unstable. One general way to decrease variance is to ensemble the results of many models.

Train multiple models



In [15]:

    
n_models = 10 # select number of models

models = []
pred_frames = []

for i in range(0, n_models):

    # store models
    models.append(H2OGradientBoostingEstimator(ntrees=500,
                                               max_depth=2 * (i + 1),
                                               distribution='huber',
                                               learn_rate=0.01 * (i + 1),
                                               stopping_rounds=5,
                                               seed=i + 1))
    
    # train models
    models[i].train(y=y, x=X_reals_decorr, training_frame=train, validation_frame=valid)
    
    # store predictions
    pred_frames.append(valid['Id'].cbind(models[i].predict(valid)))

    print('Training Progress: model %d/%d ...' % (i + 1, n_models))

print('Done.')









    



Training Progress: model 1/10 ...
Training Progress: model 2/10 ...
Training Progress: model 3/10 ...
Training Progress: model 4/10 ...
Training Progress: model 5/10 ...
Training Progress: model 6/10 ...
Training Progress: model 7/10 ...
Training Progress: model 8/10 ...
Training Progress: model 9/10 ...
Training Progress: model 10/10 ...
Done.

Calculate LOCO for each model



In [16]:

    
for k, model in enumerate(models):

    for i in X_reals_decorr:

        # train and predict with Xi set to missing
        valid_loco = h2o.deep_copy(valid, 'valid_loco')
        valid_loco[i] = np.nan
        preds_loco = model.predict(valid_loco)

        # create a new, named column for the LOCO prediction
        preds_loco.columns = [i]
        pred_frames[k] = pred_frames[k].cbind(preds_loco)

        # subtract the LOCO prediction from 
        pred_frames[k][i] = pred_frames[k][i] - pred_frames[k]['predict']
        
    print('LOCO Progress: model %d/%d ...' % (k + 1, n_models))

print('Done.')









    



LOCO Progress: model 1/10 ...
LOCO Progress: model 2/10 ...
LOCO Progress: model 3/10 ...
LOCO Progress: model 4/10 ...
LOCO Progress: model 5/10 ...
LOCO Progress: model 6/10 ...
LOCO Progress: model 7/10 ...
LOCO Progress: model 8/10 ...
LOCO Progress: model 9/10 ...
LOCO Progress: model 10/10 ...
Done.

Collect LOCO values for each model for the median home



In [17]:

    
median_loco_frames = []
col_names = ['Loco ' + str(i) for i in range(1, n_models + 1)]

for i in range(0, n_models):
    
    # collect LOCO as a column vector in a Pandas df
    preds = pred_frames[i]
    median_loco_frames.append(preds[preds['Id'] == int(quantile_dict[50]), :]\
                              .as_data_frame()\
                              .drop(['Id', 'predict'], axis=1)
                              .T)
    
loco_ensemble = pd.concat(median_loco_frames, axis=1) 
loco_ensemble.columns = col_names
loco_ensemble['Mean Local Importance'] = loco_ensemble.mean(axis=1)
loco_ensemble['Std. Dev. Local Importance'] = loco_ensemble.std(axis=1)
loco_ensemble









    Out[17]:






  
    
      
      Loco 1
      Loco 2
      Loco 3
      Loco 4
      Loco 5
      Loco 6
      Loco 7
      Loco 8
      Loco 9
      Loco 10
      Mean Local Importance
      Std. Dev. Local Importance
    
  
  
    
      HalfBath
      0.000000
      0.000000
      0.000000
      0.000000
      17.407978
      26.615154
      -0.654188
      0.000000
      0.000000
      0.000000
      4.336894
      9.076043
    
    
      BsmtFinSF1
      0.000000
      680.464241
      -336.981845
      315.137484
      1214.512703
      956.425747
      76.630281
      1519.825218
      1531.221380
      1996.006836
      795.324205
      733.805730
    
    
      MoSold
      0.000000
      595.358250
      1787.271071
      3230.697985
      3303.898013
      3550.543600
      3850.248353
      3415.769227
      5072.966855
      2902.953232
      2770.970658
      1462.174367
    
    
      PoolArea
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      BsmtHalfBath
      0.000000
      -324.321083
      -905.862109
      -916.352115
      -1350.069398
      -610.558753
      -1009.995516
      -1744.493128
      -285.008494
      -496.417082
      -764.307768
      501.027368
    
    
      BedroomAbvGr
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      22.660268
      0.000000
      0.000000
      0.000000
      2.266027
      6.798080
    
    
      BsmtFinSF2
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      GrLivArea
      -9814.912363
      -3293.085560
      -15809.346302
      -11922.584258
      -16613.081561
      -15301.129244
      -9991.529671
      -19807.260831
      -13970.186814
      -14553.987621
      -13107.710422
      4363.503123
    
    
      KitchenAbvGr
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      LotFrontage
      0.000000
      0.000000
      0.000000
      200.859772
      0.000000
      -15.328415
      -173.339762
      -238.470665
      444.862839
      187.604218
      40.618799
      185.497698
    
    
      MSSubClass
      0.000000
      0.000000
      -709.661157
      -213.455500
      -2214.505028
      16.975995
      -682.136323
      0.000000
      -331.997725
      384.716454
      -375.006328
      690.705773
    
    
      BsmtUnfSF
      121.164420
      2647.415364
      3674.875089
      4124.255683
      4699.058784
      5641.689768
      6299.426278
      3633.974463
      5917.591017
      2262.545135
      3902.199600
      1794.475306
    
    
      LotArea
      -9108.242034
      -3173.464452
      -8563.072269
      -10691.301457
      -7911.315177
      -8419.845015
      -5652.362702
      -14777.721228
      -5995.706427
      -9938.346915
      -8423.137768
      2995.008619
    
    
      OpenPorchSF
      0.000000
      355.176453
      -1089.777695
      130.642463
      -548.384396
      -785.343713
      266.417814
      748.838837
      -388.206953
      -702.716066
      -201.335326
      558.499491
    
    
      1stFlrSF
      -12816.035657
      -4749.720377
      -4650.553413
      -3123.249856
      -3961.233045
      -1524.936389
      -1415.375563
      -3122.871154
      -1500.447717
      -5490.038249
      -4235.446142
      3177.533381
    
    
      3SsnPorch
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      Fireplaces
      -2643.482971
      -5933.319113
      -3037.572496
      -2628.585165
      -3402.100046
      -2305.288406
      -3088.624344
      -3892.029889
      -8849.192387
      -2399.433849
      -3817.962867
      1954.528206
    
    
      EnclosedPorch
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      LowQualFinSF
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      2ndFlrSF
      0.000000
      0.000000
      0.000000
      0.000000
      -608.084846
      -592.995049
      -625.048765
      -712.648758
      0.000000
      0.000000
      -253.877742
      312.318017
    
    
      YearBuilt
      0.000000
      100.637253
      1514.610630
      1348.396648
      2445.587627
      2074.893215
      1530.263380
      1744.224174
      761.738695
      4507.843872
      1602.819550
      1224.869322
    
    
      YrSold
      0.000000
      456.844194
      257.992503
      831.095733
      1597.267298
      -151.615834
      597.121547
      1987.846712
      -823.560783
      -14.832626
      473.815874
      792.577017
    
    
      BsmtFullBath
      0.000000
      247.717456
      0.000000
      -297.483564
      914.217367
      348.857594
      288.174486
      0.000000
      -248.795288
      -462.691910
      78.999614
      374.696339
    
    
      WoodDeckSF
      -1605.896414
      -4872.201764
      -6925.043301
      -6380.796316
      -5369.324640
      -7617.595119
      -4263.532391
      -6881.978770
      -4553.882088
      -4736.891539
      -5320.714234
      1657.129032
    
    
      OverallCond
      0.000000
      -644.256906
      -1229.180650
      -395.593688
      -469.863885
      -258.986919
      274.087225
      0.000000
      721.658834
      216.195011
      -178.594098
      519.036582
    
    
      GarageArea
      731.604033
      0.000000
      395.115519
      399.249884
      -358.596918
      219.872650
      794.479536
      0.000000
      1466.591026
      0.000000
      364.831573
      497.871785
    
    
      ScreenPorch
      0.000000
      1342.053231
      329.164328
      2211.683096
      2088.159995
      1362.830740
      4055.699565
      907.492787
      819.242192
      397.302864
      1351.362880
      1133.937876
    
    
      MasVnrArea
      0.000000
      0.000000
      0.000000
      0.000000
      -654.879852
      0.000000
      -9.169921
      0.000000
      0.000000
      -361.344009
      -102.539378
      213.161432
    
    
      MiscVal
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      OverallQual
      5605.038536
      7845.886589
      4061.570592
      3172.157405
      3868.077097
      1652.437249
      3540.237396
      3780.340014
      775.159604
      4246.169251
      3854.707373
      1846.436805
    
    
      YearRemodAdd
      -952.580441
      -2956.527016
      -7692.476658
      -2060.712163
      -4456.392792
      -2488.663940
      -1805.367599
      -3281.516177
      -4713.228129
      -2263.878578
      -3267.134349
      1837.056771
    
    
      FullBath
      0.000000
      -1323.125640
      546.541096
      48.387112
      -183.312572
      1162.542391
      258.270741
      -916.331985
      1426.976099
      3106.391031
      412.633827
      1196.931310

Negative mean reason codes



In [18]:

    
median_mean_loco = loco_ensemble['Mean Local Importance'].sort_values()[:5]
_ = median_mean_loco.plot(kind='bar', 
                          title='Negative Mean Reason Codes for the Median of Predicted Sale Price\n', 
                          legend=False)

Positive mean reason codes



In [19]:

    
median_mean_loco = loco_ensemble['Mean Local Importance'].sort_values(ascending=False)[:5]
_ = median_mean_loco.plot(kind='bar', 
                          title='Positive Mean Reason Codes for the Median of Predicted Sale Price\n', 
                          color='r',
                          legend=False)

Shutdown H2O



In [20]:

    
h2o.cluster().shutdown(prompt=True)









    



Are you sure you want to shutdown the H2O instance running at http://127.0.0.1:54321 (Y/N)? y
H2O session _sid_8094 closed.

H2O cluster uptime:	01 secs
H2O cluster version:	3.12.0.1
H2O cluster version age:	2 months and 7 days
H2O cluster name:	H2O_from_python_phall_tff9kq
H2O cluster total nodes:	1
H2O cluster free memory:	3.556 Gb
H2O cluster total cores:	8
H2O cluster allowed cores:	8
H2O cluster status:	accepting new members, healthy
H2O connection url:	http://127.0.0.1:54321
H2O connection proxy:	None
H2O internal security:	False
Python version:	3.5.2 final

Id	predict	HalfBath	BsmtFinSF1	MoSold	BsmtHalfBath	BedroomAbvGr	GrLivArea	KitchenAbvGr	LotFrontage	MSSubClass	BsmtUnfSF	LotArea	OpenPorchSF	1stFlrSF	Fireplaces	EnclosedPorch	2ndFlrSF	YearBuilt	YrSold	BsmtFullBath	WoodDeckSF	OverallCond	GarageArea	ScreenPorch	MasVnrArea	OverallQual	YearRemodAdd	FullBath
4	162737	0	5426.69	1307.74	0	0	2825.75	0	719.187	836.729	1734.61	0	727.522	683.734	-294.17	-79.2358	-810.712	12167.3	-5918.04	-1459.14	-635.263	5173.29	-8044.2	0	0	-36467.5	21993.6	0
5	321949	-708.329	12408.5	27137.8	0	0	-7560.51	0	-3064.35	0	-321.035	-11176.7	-6596.96	13893.5	-3.02205	0	-3952.2	-7019.57	1599.46	-2264.6	-3086.45	0	-39288.8	0	609.803	-80951.9	4742.61	0
7	288940	4286.86	-38286.6	0	0	0	521.262	0	0	-716.772	275.934	229.391	-912.311	-10971	224.838	0	0	664.773	-297.844	-1.47215	-112.298	0	-9753.45	0	-139.076	-83533.4	-411.94	0
8	214558	-2934.78	-2155.88	-1596.35	0	0	-34924.2	0	824.924	0	-3677.11	5894.01	-2393.04	-717.704	1019.98	2054.01	-1302.45	1471.64	0	-525.598	-6039.15	-1901.71	-6247.74	0	1615.11	-14857.3	25196.1	0
11	130570	0	-1501.59	373.448	0	0	-772.792	0	-58.044	420.323	1796.17	24.6083	0	-7408.69	0	0	0	706.53	0	-1356.89	0	5717.06	2281.01	0	0	1182.2	2090.64	0
13	132042	0	-3415.88	-313.206	0	150.151	-733.679	0	447.161	180.826	-1629.84	-3467.46	0	0	0	0	0	220.939	0	-1251	262.766	-927.486	-469.974	-2031.6	0	1709.63	6448.98	0
18	124058	140.167	7117.68	1132.37	0	1359.23	315.657	1959.07	-58.044	4142.5	5807.16	265.833	0	-11407.5	922.09	0	0	-17613.7	1225.79	0	0	538.854	561.559	0	0	8583.6	896.597	-1646.95
21	315667	105.762	12636.7	25091.7	0	0	-30379.8	0	-2426.85	0	13761.8	-6041.51	-5266.2	9589.59	433.176	0	-6544.59	-8300.07	0	2814.84	1723.94	0	-18974.7	0	-2872.53	-107036	-2016.31	0
38	139593	140.167	-11190.9	3093.13	-60.0012	0	-1225.85	0	840.328	75.9954	3984.55	3999.82	0	4484.02	-3524.89	0	0	10936.9	240.289	0	0	-445.479	1442.54	0	-822.334	2498.6	2362.87	162.936
41	158223	-557.072	538.874	3711.56	0	0	-2371.64	0	-3263.31	1773.37	538.58	-1101.54	-2713.6	-18880.1	-8196.03	0	0	-3437.71	-496.612	2685.87	0	955.21	0	0	-1095.96	-9173.71	-579.85	-2707.86

	Loco 1	Loco 2	Loco 3	Loco 4	Loco 5	Loco 6	Loco 7	Loco 8	Loco 9	Loco 10	Mean Local Importance	Std. Dev. Local Importance
HalfBath	0.000000	0.000000	0.000000	0.000000	17.407978	26.615154	-0.654188	0.000000	0.000000	0.000000	4.336894	9.076043
BsmtFinSF1	0.000000	680.464241	-336.981845	315.137484	1214.512703	956.425747	76.630281	1519.825218	1531.221380	1996.006836	795.324205	733.805730
MoSold	0.000000	595.358250	1787.271071	3230.697985	3303.898013	3550.543600	3850.248353	3415.769227	5072.966855	2902.953232	2770.970658	1462.174367
PoolArea	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
BsmtHalfBath	0.000000	-324.321083	-905.862109	-916.352115	-1350.069398	-610.558753	-1009.995516	-1744.493128	-285.008494	-496.417082	-764.307768	501.027368
BedroomAbvGr	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	22.660268	0.000000	0.000000	0.000000	2.266027	6.798080
BsmtFinSF2	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
GrLivArea	-9814.912363	-3293.085560	-15809.346302	-11922.584258	-16613.081561	-15301.129244	-9991.529671	-19807.260831	-13970.186814	-14553.987621	-13107.710422	4363.503123
KitchenAbvGr	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
LotFrontage	0.000000	0.000000	0.000000	200.859772	0.000000	-15.328415	-173.339762	-238.470665	444.862839	187.604218	40.618799	185.497698
MSSubClass	0.000000	0.000000	-709.661157	-213.455500	-2214.505028	16.975995	-682.136323	0.000000	-331.997725	384.716454	-375.006328	690.705773
BsmtUnfSF	121.164420	2647.415364	3674.875089	4124.255683	4699.058784	5641.689768	6299.426278	3633.974463	5917.591017	2262.545135	3902.199600	1794.475306
LotArea	-9108.242034	-3173.464452	-8563.072269	-10691.301457	-7911.315177	-8419.845015	-5652.362702	-14777.721228	-5995.706427	-9938.346915	-8423.137768	2995.008619
OpenPorchSF	0.000000	355.176453	-1089.777695	130.642463	-548.384396	-785.343713	266.417814	748.838837	-388.206953	-702.716066	-201.335326	558.499491
1stFlrSF	-12816.035657	-4749.720377	-4650.553413	-3123.249856	-3961.233045	-1524.936389	-1415.375563	-3122.871154	-1500.447717	-5490.038249	-4235.446142	3177.533381
3SsnPorch	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
Fireplaces	-2643.482971	-5933.319113	-3037.572496	-2628.585165	-3402.100046	-2305.288406	-3088.624344	-3892.029889	-8849.192387	-2399.433849	-3817.962867	1954.528206
EnclosedPorch	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
LowQualFinSF	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
2ndFlrSF	0.000000	0.000000	0.000000	0.000000	-608.084846	-592.995049	-625.048765	-712.648758	0.000000	0.000000	-253.877742	312.318017
YearBuilt	0.000000	100.637253	1514.610630	1348.396648	2445.587627	2074.893215	1530.263380	1744.224174	761.738695	4507.843872	1602.819550	1224.869322
YrSold	0.000000	456.844194	257.992503	831.095733	1597.267298	-151.615834	597.121547	1987.846712	-823.560783	-14.832626	473.815874	792.577017
BsmtFullBath	0.000000	247.717456	0.000000	-297.483564	914.217367	348.857594	288.174486	0.000000	-248.795288	-462.691910	78.999614	374.696339
WoodDeckSF	-1605.896414	-4872.201764	-6925.043301	-6380.796316	-5369.324640	-7617.595119	-4263.532391	-6881.978770	-4553.882088	-4736.891539	-5320.714234	1657.129032
OverallCond	0.000000	-644.256906	-1229.180650	-395.593688	-469.863885	-258.986919	274.087225	0.000000	721.658834	216.195011	-178.594098	519.036582
GarageArea	731.604033	0.000000	395.115519	399.249884	-358.596918	219.872650	794.479536	0.000000	1466.591026	0.000000	364.831573	497.871785
ScreenPorch	0.000000	1342.053231	329.164328	2211.683096	2088.159995	1362.830740	4055.699565	907.492787	819.242192	397.302864	1351.362880	1133.937876
MasVnrArea	0.000000	0.000000	0.000000	0.000000	-654.879852	0.000000	-9.169921	0.000000	0.000000	-361.344009	-102.539378	213.161432
MiscVal	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
OverallQual	5605.038536	7845.886589	4061.570592	3172.157405	3868.077097	1652.437249	3540.237396	3780.340014	775.159604	4246.169251	3854.707373	1846.436805
YearRemodAdd	-952.580441	-2956.527016	-7692.476658	-2060.712163	-4456.392792	-2488.663940	-1805.367599	-3281.516177	-4713.228129	-2263.878578	-3267.134349	1837.056771
FullBath	0.000000	-1323.125640	546.541096	48.387112	-183.312572	1162.542391	258.270741	-916.331985	1426.976099	3106.391031	412.633827	1196.931310