Ensemble Design Pattern

Stacking is an Ensemble method which combines the outputs of a collection of models to make a prediction. The initial models, which are typically of different model types, are trained to completion on the full training dataset. Then, a secondary meta-model is trained using the initial model outputs as features. This second meta-model learns how to best combine the outcomes of the initial models to decrease the training error and can be any type of machine learning model.

Create a Stacking Ensemble model

In this notebook, we'll create an Ensemble of three neural network models and train on the natality dataset.


In [2]:
import os

import pandas as pd
import tensorflow as tf

from tensorflow import keras
from tensorflow import feature_column as fc
from tensorflow.keras import layers, models, Model

In [3]:
df = pd.read_csv("./data/babyweight_train.csv")
df.head()


Out[3]:
weight_pounds is_male mother_age plurality gestation_weeks mother_race
0 7.749249 False 12 Single(1) 40 1.0
1 7.561856 True 12 Single(1) 40 2.0
2 7.187070 False 12 Single(1) 34 3.0
3 6.375769 True 12 Single(1) 36 2.0
4 7.936641 False 12 Single(1) 35 NaN

Create our tf.data input pipeline


In [4]:
# Determine CSV, label, and key columns
# Create list of string column headers, make sure order matches.
CSV_COLUMNS = ["weight_pounds",
               "is_male",
               "mother_age",
               "plurality",
               "gestation_weeks",
               "mother_race"]

# Add string name for label column
LABEL_COLUMN = "weight_pounds"

# Set default values for each CSV column as a list of lists.
# Treat is_male and plurality as strings.
DEFAULTS = [[0.0], ["null"], [0.0], ["null"], [0.0], ["0"]]

In [5]:
def get_dataset(file_path):
    dataset = tf.data.experimental.make_csv_dataset(
        file_path,
        batch_size=15, # Artificially small to make examples easier to show.
        label_name=LABEL_COLUMN,
        select_columns=CSV_COLUMNS,
        column_defaults=DEFAULTS,
        num_epochs=1,
        ignore_errors=True)
    return dataset

train_data = get_dataset("./data/babyweight_train.csv")
test_data = get_dataset("./data/babyweight_eval.csv")

Check that our tf.data dataset:


In [6]:
def show_batch(dataset):
    for batch, label in dataset.take(1):
        for key, value in batch.items():
            print("{:20s}: {}".format(key,value.numpy()))

show_batch(train_data)


is_male             : [b'True' b'True' b'True' b'False' b'True' b'True' b'False' b'True' b'True'
 b'False' b'True' b'True' b'True' b'True' b'False']
mother_age          : [16. 17. 17. 18. 16. 17. 17. 18. 17. 17. 17. 17. 16. 16. 16.]
plurality           : [b'Single(1)' b'Single(1)' b'Single(1)' b'Single(1)' b'Single(1)'
 b'Single(1)' b'Single(1)' b'Single(1)' b'Single(1)' b'Single(1)'
 b'Single(1)' b'Single(1)' b'Single(1)' b'Single(1)' b'Single(1)']
gestation_weeks     : [38. 38. 40. 39. 41. 39. 40. 42. 40. 33. 33. 40. 39. 38. 39.]
mother_race         : [b'2.0' b'2.0' b'2.0' b'2.0' b'2.0' b'0' b'1.0' b'1.0' b'0' b'2.0' b'2.0'
 b'1.0' b'1.0' b'2.0' b'2.0']

Create our feature columns


In [7]:
numeric_columns = [fc.numeric_column("mother_age"),
                  fc.numeric_column("gestation_weeks")]

CATEGORIES = {
    'plurality': ["Single(1)", "Twins(2)", "Triplets(3)",
                  "Quadruplets(4)", "Quintuplets(5)", "Multiple(2+)"],
    'is_male' : ["True", "False", "Unknown"],
    'mother_race': [str(_) for _ in df.mother_race.unique()]
}

categorical_columns = []
for feature, vocab in CATEGORIES.items():
  cat_col = fc.categorical_column_with_vocabulary_list(
        key=feature, vocabulary_list=vocab)
  categorical_columns.append(fc.indicator_column(cat_col))

Create our ensemble models

We'll train three different neural network models.


In [8]:
inputs = {colname: tf.keras.layers.Input(
    name=colname, shape=(), dtype="float32")
    for colname in ["mother_age", "gestation_weeks"]}

inputs.update({colname: tf.keras.layers.Input(
    name=colname, shape=(), dtype="string")
    for colname in ["is_male", "plurality", "mother_race"]})

dnn_inputs = layers.DenseFeatures(categorical_columns+numeric_columns)(inputs)

# model_1
model1_h1 = layers.Dense(50, activation="relu")(dnn_inputs)
model1_h2 = layers.Dense(30, activation="relu")(model1_h1)
model1_output = layers.Dense(1, activation="relu")(model1_h2)
model_1 = tf.keras.models.Model(inputs=inputs, outputs=model1_output, name="model_1")

# model_2
model2_h1 = layers.Dense(64, activation="relu")(dnn_inputs)
model2_h2 = layers.Dense(32, activation="relu")(model2_h1)
model2_output = layers.Dense(1, activation="relu")(model2_h2)
model_2 = tf.keras.models.Model(inputs=inputs, outputs=model2_output, name="model_2")

# model_3
model3_h1 = layers.Dense(32, activation="relu")(dnn_inputs)
model3_output = layers.Dense(1, activation="relu")(model3_h1)
model_3 = tf.keras.models.Model(inputs=inputs, outputs=model3_output, name="model_3")


WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow_core/python/feature_column/feature_column_v2.py:4267: IndicatorColumn._variable_shape (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow_core/python/feature_column/feature_column_v2.py:4322: VocabularyListCategoricalColumn._num_buckets (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.

The function below trains a model and reports the MSE and RMSE on the test set.


In [9]:
# fit model on dataset
def fit_model(model):
    # define model
    model.compile(
        loss=tf.keras.losses.MeanSquaredError(),
        optimizer='adam', metrics=['mse'])
    # fit model
    model.fit(train_data.shuffle(500), epochs=1)
    
    # evaluate model
    test_loss, test_mse = model.evaluate(test_data)
    print('\n\n{}:\nTest Loss {}, Test RMSE {}'.format(
        model.name, test_loss, test_mse**0.5))
    
    return model

In [10]:
# create directory for models
try:
    os.makedirs('models')
except: 
    print("directory already exists")


directory already exists

Next, we'll train each neural network and save the trained model to file.


In [11]:
members = [model_1, model_2, model_3]

# fit and save models
n_members = len(members)

for i in range(n_members):
    # fit model
    model = fit_model(members[i])
    # save model
    filename = 'models/model_' + str(i + 1) + '.h5'
    model.save(filename, save_format='tf')
    print('Saved {}\n'.format(filename))


17638/17638 [==============================] - 127s 7ms/step - loss: 1.1232 - mse: 1.1232
   4343/Unknown - 24s 6ms/step - loss: 2.9008 - mse: 2.9011

model_1:
Test Loss 2.9008474301768974, Test RMSE 1.7032530850184173
Saved models/model_1.h5

17638/17638 [==============================] - 117s 7ms/step - loss: 1.1097 - mse: 1.1097
   4343/Unknown - 23s 5ms/step - loss: 2.0815 - mse: 2.0817

model_2:
Test Loss 2.081478821759177, Test RMSE 1.4428068143465294
Saved models/model_2.h5

17638/17638 [==============================] - 114s 6ms/step - loss: 1.1293 - mse: 1.1293
   4343/Unknown - 23s 5ms/step - loss: 2.4174 - mse: 2.4173

model_3:
Test Loss 2.417384987838966, Test RMSE 1.554769235887298
Saved models/model_3.h5

The RMSE varies on each of the neural networks.

Load the trained models and create the stacked ensemble model.

The function below loads the trained models and returns them in a list.


In [12]:
# load trained models from file
def load_models(n_models):
    all_models = []
    for i in range(n_models):
        filename = 'models/model_' + str(i + 1) + '.h5'
        # load model from file
        model = models.load_model(filename)
        # add to list of members
        all_models.append(model)
        print('>loaded %s' % filename)
    return all_models

In [14]:
# load all models
members = load_models(n_members)
print('Loaded %d models' % len(members))


>loaded models/model_1.h5
>loaded models/model_2.h5
>loaded models/model_3.h5
Loaded 3 models

We will need to freeze the layers of the pre-trained models since we won't train these models any further. The Stacked Ensemble will the trainable and learn how to best combine the results of the ensemble members.


In [15]:
# update all layers in all models to not be trainable
for i in range(n_members):
    model = members[i]
    for layer in model.layers:
        # make not trainable
        layer.trainable = False
        # rename to avoid 'unique layer name' issue
        layer._name = 'ensemble_' + str(i+1) + '_' + layer.name

Lastly, we'll create our Stacked Ensemble model. It is also a neural network. We'll use the Functional Keras API.


In [32]:
member_inputs = [model.input for model in members]

# concatenate merge output from each model
member_outputs = [model.output for model in members]
merge = layers.concatenate(member_outputs)
h1 = layers.Dense(30, activation='relu')(merge)
h2 = layers.Dense(20, activation='relu')(h1)
h3 = layers.Dense(10, activation='relu')(h2)
h4 = layers.Dense(5, activation='relu')(h2)
ensemble_output = layers.Dense(1, activation='relu')(h3)
ensemble_model = Model(inputs=member_inputs, outputs=ensemble_output)

# plot graph of ensemble
tf.keras.utils.plot_model(ensemble_model, show_shapes=True, to_file='ensemble_graph.png')

# compile
ensemble_model.compile(loss='mse', optimizer='adam', metrics=['mse'])

We need to adapt our tf.data pipeline to accommodate the multiple inputs for our Stacked Ensemble model.


In [33]:
FEATURES = ["is_male", "mother_age", "plurality",
            "gestation_weeks", "mother_race"]

# stack input features for our tf.dataset
def stack_features(features, label):
    for feature in FEATURES:
        for i in range(n_members):
            features['ensemble_' + str(i+1) + '_' + feature] = features[feature]
        
    return features, label

ensemble_data = train_data.map(stack_features).repeat(1)

In [34]:
ensemble_model.fit(ensemble_data.shuffle(500), epochs=1)


17638/17638 [==============================] - 165s 9ms/step - loss: 1.2281 - mse: 1.2281
Out[34]:
<tensorflow.python.keras.callbacks.History at 0x7f89d832e710>

Lastly, we will evaluate our Stacked Ensemble against the test set.


In [35]:
val_loss, val_mse = ensemble_model.evaluate(test_data.map(stack_features))


   4343/Unknown - 35s 8ms/step - loss: 2.0102 - mse: 2.0102- 35s 8ms/step - loss: 2.0125 -

In [36]:
print("Validation RMSE: {}".format(val_mse**0.5))


Validation RMSE: 1.4178322129942251

Copyright 2020 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License