License


Copyright 2017 J. Patrick Hall, jphall@gwu.edu

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Local Feature Importance and Reason Codes using LOCO


Based on: Lei, Jing, G’Sell, Max, Rinaldo, Alessandro, Tibshirani, Ryan J., and Wasserman, Larry. Distribution-free predictive inference for regression. Journal of the American Statistical Association, 2017.

http://www.stat.cmu.edu/~ryantibs/papers/conformal.pdf

Instead of dropping one variable and retraining a model to understand the importance of that variable in a model, these examples set a variable to missing and rescore this new, corrupted sample with the original model. This is approach may be more appropriate for nonlineaer models in which nonlinear dependencies can allow variables to nearly completely replace one another when a model is retrained.

Preliminaries: imports, start h2o, load and clean data


In [1]:
# imports
import h2o 
import numpy as np
import pandas as pd
from h2o.estimators.gbm import H2OGradientBoostingEstimator

In [2]:
# start h2o
h2o.init()
h2o.remove_all()
h2o.show_progress()


Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: java version "1.8.0_112"; Java(TM) SE Runtime Environment (build 1.8.0_112-b16); Java HotSpot(TM) 64-Bit Server VM (build 25.112-b16, mixed mode)
  Starting server from /Users/phall/anaconda/lib/python3.5/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/tc/0ss1l73113j3wdyjsxmy1j2r0000gn/T/tmpoidcpxm3
  JVM stdout: /var/folders/tc/0ss1l73113j3wdyjsxmy1j2r0000gn/T/tmpoidcpxm3/h2o_phall_started_from_python.out
  JVM stderr: /var/folders/tc/0ss1l73113j3wdyjsxmy1j2r0000gn/T/tmpoidcpxm3/h2o_phall_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321... successful.
H2O cluster uptime: 01 secs
H2O cluster version: 3.12.0.1
H2O cluster version age: 2 months and 7 days
H2O cluster name: H2O_from_python_phall_tff9kq
H2O cluster total nodes: 1
H2O cluster free memory: 3.556 Gb
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster status: accepting new members, healthy
H2O connection url: http://127.0.0.1:54321
H2O connection proxy: None
H2O internal security: False
Python version: 3.5.2 final

Load and prepare data for modeling


In [3]:
# load clean data
path = '../../03_regression/data/train.csv'
frame = h2o.import_file(path=path)


Parse progress: |█████████████████████████████████████████████████████████| 100%

In [4]:
# assign target and inputs
y = 'SalePrice'
X = [name for name in frame.columns if name not in [y, 'Id']]

LOCO is simpler to use with data containing no missing values


In [5]:
# determine column types
# impute
reals, enums = [], []
for key, val in frame.types.items():
    if key in X:
        if val == 'enum':
            enums.append(key)
        else: 
            reals.append(key)
            
_ = frame[reals].impute(method='median')
_ = frame[enums].impute(method='mode')

In [6]:
# split into training and validation
train, valid = frame.split_frame([0.7])

Understanding linear correlation and nonlinear dependencies are important for LOCO.

  • If strong relationships are present, retraining the model after removing an input will simply allow the linearly correlated or nonlinearly dependent variables to make up for the impact of the removed input. This why we will set to missing here, and not drop and retrain.
  • If such relationships are present, models must be regularized to prevent correlation or other dependencies from creating instability in model parameters or rules. (H2O GBM is regularized by column and row sampling.)
  • For H2O GBM, setting a variable to missing causes it to follow the majority path in each decision tree. The interpretation of LOCO becomes the numeric difference between the local behavior of the variable and the most common local behavior.
  • Because of linear correlation and nonlinear dependence, LOCO values are valid only for a given data and feature set.

In [7]:
# print out linearly correlated pairs
corr = train[reals].cor().as_data_frame()
for i in range(0, corr.shape[0]):
    for j in range(0, corr.shape[1]):
        if i != j:
            if np.abs(corr.iat[i, j]) > 0.7:
                print(corr.columns[i], corr.columns[j])


GarageYrBlt YearBuilt
GrLivArea TotRmsAbvGrd
TotalBsmtSF 1stFlrSF
1stFlrSF TotalBsmtSF
GarageCars GarageArea
YearBuilt GarageYrBlt
GarageArea GarageCars
TotRmsAbvGrd GrLivArea

It's likely that even more nonlinearly dependent relationships exist between inputs. Nonlinearly relationships can also behave differently at global and local scales.

Removing one var from each correlated pair may increase stability in the model and its explanations


In [8]:
X_reals_decorr = [i for i in reals if i not in  ['GarageYrBlt', 'TotRmsAbvGrd', 'TotalBsmtSF', 'GarageCars']]

Train a predictive model


In [9]:
# train GBM model
model = H2OGradientBoostingEstimator(ntrees=100,
                                     max_depth=10,
                                     distribution='huber',
                                     learn_rate=0.1,
                                     stopping_rounds=5,
                                     seed=12345)

model.train(y=y, x=X_reals_decorr, training_frame=train, validation_frame=valid)

preds = valid['Id'].cbind(model.predict(valid))


gbm Model Build progress: |███████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%

Rescore predictive model

  • Each time leaving one input (covariate) out by setting it to missing
  • To generate local feature importance values for each decision

In [10]:
h2o.no_progress()

for k, i in enumerate(X_reals_decorr):

    # train and predict with Xi set to missing
    valid_loco = h2o.deep_copy(valid, 'valid_loco')
    valid_loco[i] = np.nan
    preds_loco = model.predict(valid_loco)
    
    # create a new, named column for the LOCO prediction
    preds_loco.columns = [i]
    preds = preds.cbind(preds_loco)
    
    # subtract the LOCO prediction from 
    preds[i] = preds[i] - preds['predict']
    
    print('LOCO Progress: ' + i + ' (' + str(k+1) + '/' + str(len(X_reals_decorr)) + ') ...')
    
print('Done.')  

preds.head()


LOCO Progress: HalfBath (1/32) ...
LOCO Progress: BsmtFinSF1 (2/32) ...
LOCO Progress: MoSold (3/32) ...
LOCO Progress: PoolArea (4/32) ...
LOCO Progress: BsmtHalfBath (5/32) ...
LOCO Progress: BedroomAbvGr (6/32) ...
LOCO Progress: BsmtFinSF2 (7/32) ...
LOCO Progress: GrLivArea (8/32) ...
LOCO Progress: KitchenAbvGr (9/32) ...
LOCO Progress: LotFrontage (10/32) ...
LOCO Progress: MSSubClass (11/32) ...
LOCO Progress: BsmtUnfSF (12/32) ...
LOCO Progress: LotArea (13/32) ...
LOCO Progress: OpenPorchSF (14/32) ...
LOCO Progress: 1stFlrSF (15/32) ...
LOCO Progress: 3SsnPorch (16/32) ...
LOCO Progress: Fireplaces (17/32) ...
LOCO Progress: EnclosedPorch (18/32) ...
LOCO Progress: LowQualFinSF (19/32) ...
LOCO Progress: 2ndFlrSF (20/32) ...
LOCO Progress: YearBuilt (21/32) ...
LOCO Progress: YrSold (22/32) ...
LOCO Progress: BsmtFullBath (23/32) ...
LOCO Progress: WoodDeckSF (24/32) ...
LOCO Progress: OverallCond (25/32) ...
LOCO Progress: GarageArea (26/32) ...
LOCO Progress: ScreenPorch (27/32) ...
LOCO Progress: MasVnrArea (28/32) ...
LOCO Progress: MiscVal (29/32) ...
LOCO Progress: OverallQual (30/32) ...
LOCO Progress: YearRemodAdd (31/32) ...
LOCO Progress: FullBath (32/32) ...
Done.
Id predict HalfBath BsmtFinSF1 MoSold PoolArea BsmtHalfBath BedroomAbvGr BsmtFinSF2 GrLivArea KitchenAbvGr LotFrontage MSSubClass BsmtUnfSF LotArea OpenPorchSF 1stFlrSF 3SsnPorch Fireplaces EnclosedPorch LowQualFinSF 2ndFlrSF YearBuilt YrSold BsmtFullBath WoodDeckSF OverallCond GarageArea ScreenPorch MasVnrArea MiscVal OverallQual YearRemodAdd FullBath
4 162737 0 5426.69 1307.74 0 0 0 0 2825.75 0 719.187 836.729 1734.61 0 727.522 683.734 0 -294.17 -79.2358 0 -810.712 12167.3 -5918.04 -1459.14 -635.263 5173.29 -8044.2 0 0 0 -36467.5 21993.6 0
5 321949 -708.329 12408.5 27137.8 0 0 0 0 -7560.51 0 -3064.35 0 -321.035-11176.7 -6596.96 13893.5 0 -3.02205 0 0 -3952.2 -7019.57 1599.46 -2264.6 -3086.45 0 -39288.8 0 609.803 0 -80951.9 4742.61 0
7 288940 4286.86 -38286.6 0 0 0 0 0 521.262 0 0 -716.772 275.934 229.391 -912.311-10971 0 224.838 0 0 0 664.773 -297.844 -1.47215 -112.298 0 -9753.45 0 -139.076 0 -83533.4 -411.94 0
8 214558 -2934.78 -2155.88 -1596.35 0 0 0 0 -34924.2 0 824.924 0 -3677.11 5894.01 -2393.04 -717.704 0 1019.98 2054.01 0 -1302.45 1471.64 0 -525.598 -6039.15 -1901.71 -6247.74 0 1615.11 0 -14857.3 25196.1 0
11 130570 0 -1501.59 373.448 0 0 0 0 -772.792 0 -58.044 420.323 1796.17 24.6083 0 -7408.69 0 0 0 0 0 706.53 0 -1356.89 0 5717.06 2281.01 0 0 0 1182.2 2090.64 0
13 132042 0 -3415.88 -313.206 0 0 150.151 0 -733.679 0 447.161 180.826 -1629.84 -3467.46 0 0 0 0 0 0 0 220.939 0 -1251 262.766 -927.486 -469.974 -2031.6 0 0 1709.63 6448.98 0
18 124058 140.167 7117.68 1132.37 0 0 1359.23 0 315.657 1959.07 -58.044 4142.5 5807.16 265.833 0 -11407.5 0 922.09 0 0 0 -17613.7 1225.79 0 0 538.854 561.559 0 0 0 8583.6 896.597 -1646.95
21 315667 105.762 12636.7 25091.7 0 0 0 0 -30379.8 0 -2426.85 0 13761.8 -6041.51 -5266.2 9589.59 0 433.176 0 0 -6544.59 -8300.07 0 2814.84 1723.94 0 -18974.7 0 -2872.53 0 -107036 -2016.31 0
38 139593 140.167 -11190.9 3093.13 0 -60.0012 0 0 -1225.85 0 840.328 75.9954 3984.55 3999.82 0 4484.02 0 -3524.89 0 0 0 10936.9 240.289 0 0 -445.479 1442.54 0 -822.334 0 2498.6 2362.87 162.936
41 158223 -557.072 538.874 3711.56 0 0 0 0 -2371.64 0 -3263.31 1773.37 538.58 -1101.54 -2713.6 -18880.1 0 -8196.03 0 0 0 -3437.71 -496.612 2685.87 0 955.21 0 0 -1095.96 0 -9173.71 -579.85 -2707.86
Out[10]:

The numeric values in each column are an estimate of how much each variable contributed to each decision. These values can tell you how a variable and it's values were weighted in any given decision by the model. These values are crucially important for machine learning interpretability and are often to referred to "local feature importance", "reason codes", or "turn-down codes." The latter phrases are borrowed from credit scoring. Credit lenders must provide reasons for turning down a credit application, even for automated decisions. Reason codes can be easily extracted from LOCO local feature importance values, by simply ranking the variables that played the largest role in any given decision.

Helper function for finding quantile indices


In [11]:
def get_quantile_dict(y, id_, frame):

    """ Returns the percentiles of a column y as the indices for another column id_.
    
    Args:
        y: Column in which to find percentiles.
        id_: Id column that stores indices for percentiles of y.
        frame: H2OFrame containing y and id_. 
    
    Returns:
        Dictionary of percentile values and index column values.
    
    """
    
    quantiles_df = frame.as_data_frame()
    quantiles_df.sort_values(y, inplace=True)
    quantiles_df.reset_index(inplace=True)
    
    percentiles_dict = {}
    percentiles_dict[0] = quantiles_df.loc[0, id_]
    percentiles_dict[99] = quantiles_df.loc[quantiles_df.shape[0]-1, id_]
    inc = quantiles_df.shape[0]//10
    
    for i in range(1, 10):
        percentiles_dict[i * 10] = quantiles_df.loc[i * inc,  id_]

    return percentiles_dict

quantile_dict = get_quantile_dict('predict', 'Id', preds)
print(quantile_dict)


{0: 621.0, 80: 606.0, 50: 744.0, 99: 1299.0, 20: 372.0, 70: 849.0, 40: 207.0, 10: 996.0, 60: 983.0, 90: 1343.0, 30: 38.0}

Plot some reason codes for a representative row


In [12]:
%matplotlib inline

In [13]:
median_loco = preds[preds['Id'] == int(quantile_dict[50]), :].as_data_frame().drop(['Id', 'predict'], axis=1)
median_loco = median_loco.T.sort_values(by=0)[:5]
_ = median_loco.plot(kind='bar', 
                     title='Negative Reason Codes for the Median of Predicted Sale Price\n', 
                     legend=False)



In [14]:
median_loco = preds[preds['Id'] == int(quantile_dict[50]), :].as_data_frame().drop(['Id', 'predict'], axis=1)
median_loco = median_loco.T.sort_values(by=0, ascending=False)[:5]
_ = median_loco.plot(kind='bar', 
                     title='Positive Reason Codes for the Median of Predicted Sale Price\n', 
                     color='r',
                     legend=False)


Ensembling explantions to reduce local variance

Explanations derived from high variance machine learning models can be unstable. One general way to decrease variance is to ensemble the results of many models.

Train multiple models


In [15]:
n_models = 10 # select number of models

models = []
pred_frames = []

for i in range(0, n_models):

    # store models
    models.append(H2OGradientBoostingEstimator(ntrees=500,
                                               max_depth=2 * (i + 1),
                                               distribution='huber',
                                               learn_rate=0.01 * (i + 1),
                                               stopping_rounds=5,
                                               seed=i + 1))
    
    # train models
    models[i].train(y=y, x=X_reals_decorr, training_frame=train, validation_frame=valid)
    
    # store predictions
    pred_frames.append(valid['Id'].cbind(models[i].predict(valid)))

    print('Training Progress: model %d/%d ...' % (i + 1, n_models))

print('Done.')


Training Progress: model 1/10 ...
Training Progress: model 2/10 ...
Training Progress: model 3/10 ...
Training Progress: model 4/10 ...
Training Progress: model 5/10 ...
Training Progress: model 6/10 ...
Training Progress: model 7/10 ...
Training Progress: model 8/10 ...
Training Progress: model 9/10 ...
Training Progress: model 10/10 ...
Done.

Calculate LOCO for each model


In [16]:
for k, model in enumerate(models):

    for i in X_reals_decorr:

        # train and predict with Xi set to missing
        valid_loco = h2o.deep_copy(valid, 'valid_loco')
        valid_loco[i] = np.nan
        preds_loco = model.predict(valid_loco)

        # create a new, named column for the LOCO prediction
        preds_loco.columns = [i]
        pred_frames[k] = pred_frames[k].cbind(preds_loco)

        # subtract the LOCO prediction from 
        pred_frames[k][i] = pred_frames[k][i] - pred_frames[k]['predict']
        
    print('LOCO Progress: model %d/%d ...' % (k + 1, n_models))

print('Done.')


LOCO Progress: model 1/10 ...
LOCO Progress: model 2/10 ...
LOCO Progress: model 3/10 ...
LOCO Progress: model 4/10 ...
LOCO Progress: model 5/10 ...
LOCO Progress: model 6/10 ...
LOCO Progress: model 7/10 ...
LOCO Progress: model 8/10 ...
LOCO Progress: model 9/10 ...
LOCO Progress: model 10/10 ...
Done.

Collect LOCO values for each model for the median home


In [17]:
median_loco_frames = []
col_names = ['Loco ' + str(i) for i in range(1, n_models + 1)]

for i in range(0, n_models):
    
    # collect LOCO as a column vector in a Pandas df
    preds = pred_frames[i]
    median_loco_frames.append(preds[preds['Id'] == int(quantile_dict[50]), :]\
                              .as_data_frame()\
                              .drop(['Id', 'predict'], axis=1)
                              .T)
    
loco_ensemble = pd.concat(median_loco_frames, axis=1) 
loco_ensemble.columns = col_names
loco_ensemble['Mean Local Importance'] = loco_ensemble.mean(axis=1)
loco_ensemble['Std. Dev. Local Importance'] = loco_ensemble.std(axis=1)
loco_ensemble


Out[17]:
Loco 1 Loco 2 Loco 3 Loco 4 Loco 5 Loco 6 Loco 7 Loco 8 Loco 9 Loco 10 Mean Local Importance Std. Dev. Local Importance
HalfBath 0.000000 0.000000 0.000000 0.000000 17.407978 26.615154 -0.654188 0.000000 0.000000 0.000000 4.336894 9.076043
BsmtFinSF1 0.000000 680.464241 -336.981845 315.137484 1214.512703 956.425747 76.630281 1519.825218 1531.221380 1996.006836 795.324205 733.805730
MoSold 0.000000 595.358250 1787.271071 3230.697985 3303.898013 3550.543600 3850.248353 3415.769227 5072.966855 2902.953232 2770.970658 1462.174367
PoolArea 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
BsmtHalfBath 0.000000 -324.321083 -905.862109 -916.352115 -1350.069398 -610.558753 -1009.995516 -1744.493128 -285.008494 -496.417082 -764.307768 501.027368
BedroomAbvGr 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 22.660268 0.000000 0.000000 0.000000 2.266027 6.798080
BsmtFinSF2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
GrLivArea -9814.912363 -3293.085560 -15809.346302 -11922.584258 -16613.081561 -15301.129244 -9991.529671 -19807.260831 -13970.186814 -14553.987621 -13107.710422 4363.503123
KitchenAbvGr 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
LotFrontage 0.000000 0.000000 0.000000 200.859772 0.000000 -15.328415 -173.339762 -238.470665 444.862839 187.604218 40.618799 185.497698
MSSubClass 0.000000 0.000000 -709.661157 -213.455500 -2214.505028 16.975995 -682.136323 0.000000 -331.997725 384.716454 -375.006328 690.705773
BsmtUnfSF 121.164420 2647.415364 3674.875089 4124.255683 4699.058784 5641.689768 6299.426278 3633.974463 5917.591017 2262.545135 3902.199600 1794.475306
LotArea -9108.242034 -3173.464452 -8563.072269 -10691.301457 -7911.315177 -8419.845015 -5652.362702 -14777.721228 -5995.706427 -9938.346915 -8423.137768 2995.008619
OpenPorchSF 0.000000 355.176453 -1089.777695 130.642463 -548.384396 -785.343713 266.417814 748.838837 -388.206953 -702.716066 -201.335326 558.499491
1stFlrSF -12816.035657 -4749.720377 -4650.553413 -3123.249856 -3961.233045 -1524.936389 -1415.375563 -3122.871154 -1500.447717 -5490.038249 -4235.446142 3177.533381
3SsnPorch 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Fireplaces -2643.482971 -5933.319113 -3037.572496 -2628.585165 -3402.100046 -2305.288406 -3088.624344 -3892.029889 -8849.192387 -2399.433849 -3817.962867 1954.528206
EnclosedPorch 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
LowQualFinSF 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2ndFlrSF 0.000000 0.000000 0.000000 0.000000 -608.084846 -592.995049 -625.048765 -712.648758 0.000000 0.000000 -253.877742 312.318017
YearBuilt 0.000000 100.637253 1514.610630 1348.396648 2445.587627 2074.893215 1530.263380 1744.224174 761.738695 4507.843872 1602.819550 1224.869322
YrSold 0.000000 456.844194 257.992503 831.095733 1597.267298 -151.615834 597.121547 1987.846712 -823.560783 -14.832626 473.815874 792.577017
BsmtFullBath 0.000000 247.717456 0.000000 -297.483564 914.217367 348.857594 288.174486 0.000000 -248.795288 -462.691910 78.999614 374.696339
WoodDeckSF -1605.896414 -4872.201764 -6925.043301 -6380.796316 -5369.324640 -7617.595119 -4263.532391 -6881.978770 -4553.882088 -4736.891539 -5320.714234 1657.129032
OverallCond 0.000000 -644.256906 -1229.180650 -395.593688 -469.863885 -258.986919 274.087225 0.000000 721.658834 216.195011 -178.594098 519.036582
GarageArea 731.604033 0.000000 395.115519 399.249884 -358.596918 219.872650 794.479536 0.000000 1466.591026 0.000000 364.831573 497.871785
ScreenPorch 0.000000 1342.053231 329.164328 2211.683096 2088.159995 1362.830740 4055.699565 907.492787 819.242192 397.302864 1351.362880 1133.937876
MasVnrArea 0.000000 0.000000 0.000000 0.000000 -654.879852 0.000000 -9.169921 0.000000 0.000000 -361.344009 -102.539378 213.161432
MiscVal 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
OverallQual 5605.038536 7845.886589 4061.570592 3172.157405 3868.077097 1652.437249 3540.237396 3780.340014 775.159604 4246.169251 3854.707373 1846.436805
YearRemodAdd -952.580441 -2956.527016 -7692.476658 -2060.712163 -4456.392792 -2488.663940 -1805.367599 -3281.516177 -4713.228129 -2263.878578 -3267.134349 1837.056771
FullBath 0.000000 -1323.125640 546.541096 48.387112 -183.312572 1162.542391 258.270741 -916.331985 1426.976099 3106.391031 412.633827 1196.931310

Negative mean reason codes


In [18]:
median_mean_loco = loco_ensemble['Mean Local Importance'].sort_values()[:5]
_ = median_mean_loco.plot(kind='bar', 
                          title='Negative Mean Reason Codes for the Median of Predicted Sale Price\n', 
                          legend=False)


Positive mean reason codes


In [19]:
median_mean_loco = loco_ensemble['Mean Local Importance'].sort_values(ascending=False)[:5]
_ = median_mean_loco.plot(kind='bar', 
                          title='Positive Mean Reason Codes for the Median of Predicted Sale Price\n', 
                          color='r',
                          legend=False)


Shutdown H2O


In [20]:
h2o.cluster().shutdown(prompt=True)


Are you sure you want to shutdown the H2O instance running at http://127.0.0.1:54321 (Y/N)? y
H2O session _sid_8094 closed.