Calculating SHAP Values for DRF models in H2O

Note: this example is adapted from an example published in the shap package https://github.com/slundberg/shap/blob/master/notebooks/tree_explainer/Front%20page%20example%20(XGBoost).ipynb


In [4]:
import h2o
import shap
from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o import H2OFrame

# initialize H2O
h2o.init()

# load JS visualization code to notebook
shap.initjs()


Checking whether there is an H2O instance running at http://localhost:54321 . connected.
H2O cluster uptime: 5 mins 20 secs
H2O cluster timezone: America/Los_Angeles
H2O data parsing timezone: UTC
H2O cluster version: 3.25.0.99999
H2O cluster version age: 9 minutes
H2O cluster name: navdeepgill
H2O cluster total nodes: 1
H2O cluster free memory: 3.250 Gb
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster status: locked, healthy
H2O connection url: http://localhost:54321
H2O connection proxy: None
H2O internal security: False
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4
Python version: 3.6.0 final

In [6]:
# train a GBM model in H2O
X, y = shap.datasets.boston()
boston_housing = H2OFrame(X).cbind(H2OFrame(y, column_names=["medv"]))

model = H2ORandomForestEstimator(ntrees=100)
model.train(training_frame=boston_housing, y="medv")


Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%
drf Model Build progress: |███████████████████████████████████████████████| 100%

In [7]:
# calculate SHAP values using function predict_contributions
contributions = model.predict_contributions(boston_housing)

In [8]:
# convert the H2O Frame to use with shap's visualization functions
contributions_matrix = contributions.as_data_frame().as_matrix()
# shap values are calculated for all features
shap_values = contributions_matrix[:,0:13]
# expected values is the last returned column
expected_value = contributions_matrix[:,13].min()


/Users/navdeepgill/Desktop/h2o-3/h2o-py/h2o3-py-env/lib/python3.6/site-packages/ipykernel_launcher.py:2: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  

In [9]:
# visualize the first prediction's explanation
shap.force_plot(expected_value, shap_values[0,:], X.iloc[0,:])


Out[9]:
Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.

In [10]:
# visualize the training set predictions
shap.force_plot(expected_value, shap_values, X)


Out[10]:
Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.

In [11]:
# create a SHAP dependence plot to show the effect of a single feature across the whole dataset
shap.dependence_plot("RM", shap_values, X)



In [12]:
# summarize the effects of all the features
shap.summary_plot(shap_values, X)



In [13]:
shap.summary_plot(shap_values, X, plot_type="bar")