Introducing CivisML 2.0

Note: We are continually releasing changes to CivisML, and this notebook is useful for any versions 2.0.0 and above.

Data scientists are on the front lines of their organization’s most important customer growth and engagement questions, and they need to guide action as quickly as possible by getting models into production. CivisML is a machine learning service that makes it possible for data scientists to massively increase the speed with which they can get great models into production. And because it’s built on open-source packages, CivisML remains transparent and data scientists remain in control.

In this notebook, we’ll go over the new features introduced in CivisML 2.0. For a walkthrough of CivisML’s fundamentals, check out this introduction to the mechanics of CivisML: https://github.com/civisanalytics/civis-python/blob/master/examples/CivisML_parallel_training.ipynb

CivisML 2.0 is full of new features to make modeling faster, more accurate, and more portable. This notebook will cover the following topics:

  • CivisML overview
  • Parallel training and validation
  • Use of the new ETL transformer, DataFrameETL, for easy, customizable ETL
  • Stacked models: combine models to get one bigger, better model
  • Model portability: get trained models out of CivisML
  • Multilayer perceptron models: neural networks built in to CivisML
  • Hyperband: a smarter alternative to grid search

CivisML can be used to build models that answer all kinds of business questions, such as what movie to recommend to a customer, or which customers are most likely to upgrade their accounts. For the sake of example, this notebook uses a publicly available dataset on US colleges, and focuses on predicting the type of college (public non-profit, private non-profit, or private for-profit).


In [1]:
# first, let's import the packages we need
import requests
from io import StringIO
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import model_selection

# import the Civis Python API client
import civis
# ModelPipeline is the class used to build CivisML models
from civis.ml import ModelPipeline

In [2]:
# Suppress warnings for demo purposes. This is not recommended as a general practice.
import warnings
warnings.filterwarnings('ignore')

Downloading data

Before we build any models, we need a dataset to play with. We're going to use the most recent College Scorecard data from the Department of Education.

This dataset is collected to study the performance of US higher education institutions. You can learn more about it in this technical paper, and you can find details on the dataset features in this data dictionary.


In [3]:
# Downloading data; this may take a minute
# Two kind of nulls
df = pd.read_csv("https://ed-public-download.app.cloud.gov/downloads/Most-Recent-Cohorts-All-Data-Elements.csv", sep=",", na_values=['NULL', 'PrivacySuppressed'], low_memory=False)

In [4]:
# How many rows and columns?
df.shape


Out[4]:
(7593, 1805)

In [5]:
# What are some of the column names?
df.columns


Out[5]:
Index(['UNITID', 'OPEID', 'OPEID6', 'INSTNM', 'CITY', 'STABBR', 'ZIP',
       'ACCREDAGENCY', 'INSTURL', 'NPCURL',
       ...
       'OMENRYP8_FTNFT', 'OMENRAP8_FTNFT', 'OMENRUP8_FTNFT', 'OMACHT6_PTNFT',
       'OMAWDP6_PTNFT', 'OMACHT8_PTNFT', 'OMAWDP8_PTNFT', 'OMENRYP8_PTNFT',
       'OMENRAP8_PTNFT', 'OMENRUP8_PTNFT'],
      dtype='object', length=1805)

Data Munging

Before running CivisML, we need to do some basic data munging, such as removing missing data from the dependent variable, and splitting the data into training and test sets.

Throughout this notebook, we'll be trying to predict whether a college is public (labelled as 1), private non-profit (2), or private for-profit (3). The column name for this dependent variable is "CONTROL".


In [6]:
# Make sure to remove any rows with nulls in the dependent variable
df = df[np.isfinite(df['CONTROL'])]

In [7]:
# split into training and test sets
train_data, test_data = model_selection.train_test_split(df, test_size=0.2)

In [8]:
# print a few sample columns
train_data.head()


Out[8]:
UNITID OPEID OPEID6 INSTNM CITY STABBR ZIP ACCREDAGENCY INSTURL NPCURL ... OMENRYP8_FTNFT OMENRAP8_FTNFT OMENRUP8_FTNFT OMACHT6_PTNFT OMAWDP6_PTNFT OMACHT8_PTNFT OMAWDP8_PTNFT OMENRYP8_PTNFT OMENRAP8_PTNFT OMENRUP8_PTNFT
1575 164599 2337400 23374 Bancroft School of Massage Therapy Worcester MA 01604 Accrediting Commission of Career Schools and C... https://www.bancroftsmt.com www.bancroftsmt.com/NetPriceCalculator/npcalc.htm ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
672 131803 145900 1459 Strayer University-District of Columbia Washington DC 20005 Middle States Commission on Higher Education www.strayer.edu/district-columbia/washington https://strayer.aidcalc.com/netprice.htm ... 0.0000 0.1667 0.2778 199.0 0.2513 199.0 0.2915 0.0302 0.2915 0.3869
7388 21130702 323901 3239 Bucks County Community College-Lower Bucks Campus Bristol PA 190070277 Middle States Commission on Higher Education www.bucks.edu NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6926 483902 4225200 42252 Yechanlaz Instituto Vocacional Miami FL 33144-4817 NaN www.yechanlaz-instituto.com www.yechanlaz-instituto.com ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3571 224110 355800 3558 North Central Texas College Gainesville TX 76240-4699 Southern Association of Colleges and Schools C... www.nctc.edu www.collegeforalltexans.com/apps/CollegeMoney/ ... 0.0125 0.5625 0.2750 487.0 0.1109 487.0 0.1150 0.0041 0.5236 0.3573

5 rows × 1805 columns

Some of these columns are duplicates, or contain information we don't want to use in our model (like college names and URLs). CivisML can take a list of columns to exclude and do this part of the data munging for us, so let's make that list here.


In [8]:
to_exclude = ['ADM_RATE_ALL', 'OPEID', 'OPEID6', 'ZIP', 'INSTNM', 
              'INSTURL', 'NPCURL', 'ACCREDAGENCY', 'T4APPROVALDATE', 
              'STABBR', 'ALIAS', 'REPAY_DT_MDN', 'SEPAR_DT_MDN']

Basic CivisML Usage

When building a supervised model, there are a few basic things you'll probably want to do:

  1. Transform the data into a modelling-friendly format
  2. Train the model on some labelled data
  3. Validate the model
  4. Use the model to make predictions about unlabelled data

CivisML does all of this in three lines of code. Let's fit a basic sparse logistic model to see how.

The first thing we need to do is build a ModelPipeline object. This stores all of the basic configuration options for the model. We'll tell it things like the type of model, dependent variable, and columns we want to exclude. CivisML handles basic ETL for you, including categorical expansion of any string-type columns.


In [9]:
# Use a push-button workflow to fit a model with reasonable default parameters
sl_model = ModelPipeline(model='sparse_logistic',
                         model_name='Example sparse logistic',
                         primary_key='UNITID',
                         dependent_variable=['CONTROL'],
                         excluded_columns=to_exclude)

Next, we want to train and validate the model by calling .train on the ModelPipeline object. CivisML uses 4-fold cross-validation on the training set. You can train on local data or query data from Redshift. In this case, we have our data locally, so we just pass the data frame.


In [10]:
sl_train = sl_model.train(train_data)

This returns a ModelFuture object, which is non-blocking-- this means that you can keep doing things in your notebook while the model runs on Civis Platform in the background. If you want to make a blocking call (one that doesn't complete until your model is finished), you can use .result().


In [11]:
# non-blocking
sl_train


Out[11]:
<ModelFuture at 0x7f293cf39eb8 state=running>

In [12]:
# blocking
sl_train.result()


Out[12]:
{'container_id': 9137792,
 'error': None,
 'finished_at': '2018-01-17T21:25:08.000Z',
 'id': 69726571,
 'is_cancel_requested': False,
 'started_at': '2018-01-17T21:18:48.000Z',
 'state': 'succeeded'}

Parallel Model Tuning and Validation

We didn't actually specify the number of jobs in the .train() call above, but behind the scenes, the model was actually training in parallel! In CivisML 2.0, model tuning and validation will automatically be distributed across your computing cluster, without ever using more than 90% of the cluster resources. This means that you can build models faster and try more model configurations, leaving you more time to think critically about your data. If you decide you want more control over the resources you're using, you can set the n_jobs parameter to a specific number of jobs, and CivisML won't run more than that at once.

We can see how well the model did by looking at the validation metrics.


In [13]:
# loop through the metric names and print to screen
metrics = [print(key) for key in sl_train.metrics.keys()]


accuracy
confusion_matrix
p_correct
pop_incidence_true
pop_incidence_pred
roc_auc
log_loss
brier_score
deciles
roc_curve_by_class
calibration_curve_by_class
roc_auc_macroavg
score_histogram
training_histogram
oos_score_table

In [14]:
# ROC AUC for each of the three categories in our dependent variable
sl_train.metrics['roc_auc']


Out[14]:
[0.9963479457451291, 0.9413246335261132, 0.9602249988203488]

Impressive!

This is the basic CivisML workflow: create the model, train, and make predictions. There are other configuration options for more complex use cases; for example, you can create a custom estimator, pass custom dependencies, manage the computing resources for larger models, and more. For more information, see the Machine Learning section of the Python API client docs.

Now that we can build a simple model, let's see what's new to CivisML 2.0!

Custom ETL

CivisML can do several data transformations to prepare your data for modeling. This makes data preprocessing easier, and makes it part of your model pipeline rather than an additional script you have to run. CivisML's built-in ETL includes:

  • Categorical expansion: expand a single column of strings or categories into separate binary variables.
  • Dropping columns: remove columns not needed in a model, such as an ID number.
  • Removing null columns: remove columns that contain no data.

With CivisML 2.0, you can now recreate and customize this ETL using DataFrameETL, our open source ETL transformer, available on GitHub.

By default, CivisML will use DataFrameETL to automatically detect non-numeric columns for categorical expansion. Our example college dataset has a lot of integer columns which are actually categorical, but we can make sure they're handled correctly by passing CivisML a custom ETL transformer.


In [15]:
# The ETL transformer used in CivisML can be found in the civismlext module
from civismlext.preprocessing import DataFrameETL

This creates a list of columns to categorically expand, identified using the data dictionary available here.


In [16]:
# column indices for columns to expand
to_expand = list(df.columns[:21]) + list(df.columns[23:36]) + list(df.columns[99:290]) + \
    list(df.columns[[1738, 1773, 1776]])

In [17]:
# create ETL estimator to pass to CivisML
etl = DataFrameETL(cols_to_drop=to_exclude, 
                   cols_to_expand=to_expand, # we made this column list during data munging
                   check_null_cols='warn')

Model Stacking

Now it's time to fit a model. Let's take a look at model stacking, which is new to CivisML 2.0.

Stacking lets you combine several algorithms into a single model which performs as well or better than the component algorithms. We use stacking at Civis to build more accurate models, which saves our data scientists time comparing algorithm performance. In CivisML, we have two stacking workflows: stacking_classifier (sparse logistic, GBT, and random forest, with a logistic regression model as a "meta-estimator" to combine predictions from the other models); and stacking_regressor (sparse linear, GBT, and random forest, with a non-negative linear regression as the meta-estimator). Use them the same way you use sparse_logistic or other pre-defined models. If you want to learn more about how stacking works under the hood, take a look at this talk by the person at Civis who wrote it!

Let's fit both a stacking classifier and some un-stacked models, so we can compare the performance.


In [19]:
workflows = ['stacking_classifier',
            'sparse_logistic',
            'random_forest_classifier',
            'gradient_boosting_classifier']
models = []
# create a model object for each of the four model types
for wf in workflows:
    model = ModelPipeline(model=wf,
                          model_name=wf + ' v2 example',
                          primary_key='UNITID',
                          dependent_variable=['CONTROL'],
                          etl=etl  # use the custom ETL we created
                          )
    models.append(model)

In [20]:
# iterate over the model objects and run a CivisML training job for each
trains = []
for model in models:
    train = model.train(train_data)
    trains.append(train)

Let's plot diagnostics for each of the models. In the Civis Platform, these plots will automatically be built and displayed in the "Models" tab. But for the sake of example, let's also explicitly plot ROC curves and AUCs in the notebook.

There are three classes (public, non-profit private, and for-profit private), so we'll have three curves per model. It looks like all of the models are doing well, with sparse logistic performing slightly worse than the other three.


In [21]:
%matplotlib inline
# Let's look at how the model performed during validation
def extract_roc(fut_job, model_name):
    '''Build a data frame of ROC curve data from the completed training job `fut_job`
    with model name `model_name`. Note that this function will only work for a classification
    model where the dependent variable has more than two classes.'''
    aucs = fut_job.metrics['roc_auc']
    roc_curve = fut_job.metrics['roc_curve_by_class']
    n_classes = len(roc_curve)
    fpr = []
    tpr = []
    class_num = []
    auc = []
    for i, curve in enumerate(roc_curve):
        fpr.extend(curve['fpr'])
        tpr.extend(curve['tpr'])
        class_num.extend([i] * len(curve['fpr']))
        auc.extend([aucs[i]] * len(curve['fpr']))
    model_vec = [model_name] * len(fpr)
    df = pd.DataFrame({
        'model': model_vec,
        'class': class_num,
        'fpr': fpr,
        'tpr': tpr,
        'auc': auc
    })
    return df

# extract ROC curve information for all of the trained models
workflows_abbrev = ['stacking', 'logistic', 'RF', 'GBT']
roc_dfs = [extract_roc(train, w) for train, w in zip(trains, workflows_abbrev)]
roc_df = pd.concat(roc_dfs)

# create faceted ROC curve plots. Each row of plots is a different model type, and each
# column of plots is a different class of the dependent variable.
g = sns.FacetGrid(roc_df, col="class",  row="model")
g = g.map(plt.plot, "fpr", "tpr", color='blue')


All of the models perform quite well, so it's difficult to compare based on the ROC curves. Let's plot the AUCs themselves.


In [22]:
# Plot AUCs for each model
%matplotlib inline
auc_df = roc_df[['model', 'class', 'auc']]
auc_df.drop_duplicates(inplace=True)
plt.show(sns.swarmplot(x=auc_df['model'], y=auc_df['auc']))


Here we can see that all models but sparse logistic perform quite well, but stacking appears to perform marginally better than the others. For more challenging modeling tasks, the difference between stacking and other models will often be more pronounced.

Now our models are trained, and we know that they all perform very well. Because the AUCs are all so high, we would expect the models to make similar predictions. Let's see if that's true.


In [23]:
# kick off a prediction job for each of the four models
preds = [model.predict(test_data) for model in models]

In [24]:
# This will run on Civis Platform cloud resources
[pred.result() for pred in preds]


Out[24]:
[{'container_id': 9138218,
  'error': None,
  'finished_at': '2018-01-17T21:44:07.000Z',
  'id': 69728304,
  'is_cancel_requested': False,
  'started_at': '2018-01-17T21:43:26.000Z',
  'state': 'succeeded'},
 {'container_id': 9138220,
  'error': None,
  'finished_at': '2018-01-17T21:44:10.000Z',
  'id': 69728306,
  'is_cancel_requested': False,
  'started_at': '2018-01-17T21:43:32.000Z',
  'state': 'succeeded'},
 {'container_id': 9138222,
  'error': None,
  'finished_at': '2018-01-17T21:44:08.000Z',
  'id': 69728308,
  'is_cancel_requested': False,
  'started_at': '2018-01-17T21:43:36.000Z',
  'state': 'succeeded'},
 {'container_id': 9138229,
  'error': None,
  'finished_at': '2018-01-17T21:44:11.000Z',
  'id': 69728315,
  'is_cancel_requested': False,
  'started_at': '2018-01-17T21:43:41.000Z',
  'state': 'succeeded'}]

In [25]:
# print the top few rows for each of the models
pred_df = [pred.table.head() for pred in preds]
import pprint
pprint.pprint(pred_df)


[          control_1  control_2  control_3
UNITID                                   
217882     0.993129   0.006856   0.000015
195234     0.001592   0.990423   0.007985
446385     0.002784   0.245300   0.751916
13508115   0.003109   0.906107   0.090785
459499     0.005351   0.039922   0.954726,
              control_1  control_2  control_3
UNITID                                      
217882    9.954234e-01   0.000200   0.004377
195234    6.766601e-08   0.999615   0.000385
446385    4.571749e-03   0.056303   0.939125
13508115  1.768058e-02   0.699806   0.282514
459499    1.319468e-02   0.285295   0.701510,
           control_1  control_2  control_3
UNITID                                   
217882        0.960      0.034      0.006
195234        0.012      0.974      0.014
446385        0.020      0.508      0.472
13508115      0.006      0.914      0.080
459499        0.032      0.060      0.908,
           control_1  control_2  control_3
UNITID                                   
217882     0.993809   0.005610   0.000581
195234     0.004323   0.991094   0.004583
446385     0.001309   0.066452   0.932238
13508115   0.012525   0.809062   0.178413
459499     0.002034   0.061846   0.936120]

Looks like the probabilities here aren't exactly the same, but are directionally identical-- so, if you chose the class that had the highest probability for each row, you'd end up with the same predictions for all models. This makes sense, because all of the models performed well.

Model Portability

What if you want to score a model outside of Civis Platform? Maybe you want to deploy this model in an app for education policy makers. In CivisML 2.0, you can easily get the trained model pipeline out of the ModelFuture object.


In [26]:
train_stack = trains[0] # Get the ModelFuture for the stacking model
trained_model = train_stack.estimator

This Pipeline contains all of the steps CivisML used to train the model, from ETL to the model itself. We can print each step individually to get a better sense of what is going on.


In [27]:
# print each of the estimators in the pipeline, separated by newlines for readability
for step in train_stack.estimator.steps:
    print(step[1])
    print('\n')


DataFrameETL(check_null_cols='warn',
       cols_to_drop=['ADM_RATE_ALL', 'OPEID', 'OPEID6', 'ZIP', 'INSTNM', 'INSTURL', 'NPCURL', 'ACCREDAGENCY', 'T4APPROVALDATE', 'STABBR', 'ALIAS', 'REPAY_DT_MDN', 'SEPAR_DT_MDN'],
       cols_to_expand=['UNITID', 'OPEID', 'OPEID6', 'INSTNM', 'CITY', 'STABBR', 'ZIP', 'ACCREDAGENCY', 'INSTURL', 'NPCURL', 'SCH_DEG', 'HCM2', 'MAIN', 'NUMBRANCH', 'PREDDEG', 'HIGHDEG', 'CONTROL', 'ST_FIPS', 'REGION', 'LOCALE', 'LOCALE2', 'CCBASIC', 'CCUGPROF', 'CCSIZSET', 'HBCU', 'PBI', 'ANNHI', 'TRIBAL',...RT2', 'CIP54ASSOC', 'CIP54CERT4', 'CIP54BACHL', 'DISTANCEONLY', 'ICLEVEL', 'OPENADMP', 'ACCREDCODE'],
       dataframe_output=False, dummy_na=True, fill_value=0.0)


Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0)


StackedClassifier(cv=StratifiedKFold(n_splits=4, random_state=42420, shuffle=True),
         estimator_list=[('sparse_logistic', Pipeline(memory=None,
     steps=[('selectfrommodel', SelectFromModel(estimator=LogitNet(alpha=1, cut_point=0.5, fit_intercept=True, lambda_path=None,
     max_iter=10000, min_lambda_ratio=0.0001, n_jobs=1, n_lambda=100,
     n_splits=4, random_state=42, scoring='...    random_state=42, refit=True, scoring=None, solver='lbfgs',
           tol=1e-08, verbose=0))]))],
         n_jobs=1, pre_dispatch='2*n_jobs', verbose=0)


Now we can see that there are three steps: the DataFrameETL object we passed in, a null imputation step, and the stacking estimator itself.

We can use this outside of CivisML simply by calling .predict on the estimator. This will make predictions using the model in the notebook without using CivisML.


In [28]:
# drop the dependent variable so we don't use it to predict itself!
predictions = trained_model.predict(test_data.drop(labels=['CONTROL'], axis=1))

In [29]:
# print out the class predictions. These will be integers representing the predicted
# class rather than probabilities.
predictions


Out[29]:
array([1, 2, 3, ..., 3, 2, 2])

Hyperparameter optimization with Hyperband and Neural Networks

Multilayer Perceptrons (MLPs) are simple neural networks, which are now built in to CivisML. The MLP estimators in CivisML come from muffnn, another open source package written and maintained by Civis Analytics using tensorflow. Let's fit one using hyperband.

Tuning hyperparameters is a critical chore for getting an algorithm to perform at its best, but it can take a long time to run. Using CivisML 2.0, we can use hyperband as an alternative to conventional grid search for hyperparameter optimization-- it runs about twice as fast. While grid search runs every parameter combination for the full time, hyperband runs many combinations for a short time, then filters out the best, runs them for longer, filters again, and so on. This means that you can try more combinations in less time, so we recommend using it whenever possible. The hyperband estimator is open source and available on GitHub. You can learn about the details in the original paper, Li et al. (2016).

Right now, hyperband is implemented in CivisML named preset models for the following algorithms:

  • Multilayer Perceptrons (MLPs)
  • Stacking
  • Random forests
  • GBTs
  • ExtraTrees

Unlike grid search, you don't need to specify values to search over. If you pass cross_validation_parameters='hyperband' to ModelPipeline, hyperparameter combinations will be randomly drawn from preset distributions.


In [30]:
# build a model specifying the MLP model with hyperband
model_mlp = ModelPipeline(model='multilayer_perceptron_classifier',
                          model_name='MLP example',
                          primary_key='UNITID',
                          dependent_variable=['CONTROL'],
                          cross_validation_parameters='hyperband',
                          etl=etl
                          )
train_mlp = model_mlp.train(train_data, 
                            n_jobs=10) # parallel hyperparameter optimization and validation!
# block until the job finishes
train_mlp.result()


Out[30]:
{'container_id': 9138258,
 'error': None,
 'finished_at': '2018-01-17T22:11:21.000Z',
 'id': 69728426,
 'is_cancel_requested': False,
 'started_at': '2018-01-17T21:44:33.000Z',
 'state': 'succeeded'}

Let's dig into the hyperband model a little bit. Like the stacking model, the model below starts with ETL and null imputation, but contains some additional steps: a step to scale the predictor variables (which improves neural network performance), and a hyperband searcher containing the MLP.


In [31]:
for step in train_mlp.estimator.steps:
    print(step[1])
    print('\n')


INFO:tensorflow:Restoring parameters from /tmp/tmpe49np0dv/saved_model
DataFrameETL(check_null_cols='warn',
       cols_to_drop=['ADM_RATE_ALL', 'OPEID', 'OPEID6', 'ZIP', 'INSTNM', 'INSTURL', 'NPCURL', 'ACCREDAGENCY', 'T4APPROVALDATE', 'STABBR', 'ALIAS', 'REPAY_DT_MDN', 'SEPAR_DT_MDN'],
       cols_to_expand=['UNITID', 'OPEID', 'OPEID6', 'INSTNM', 'CITY', 'STABBR', 'ZIP', 'ACCREDAGENCY', 'INSTURL', 'NPCURL', 'SCH_DEG', 'HCM2', 'MAIN', 'NUMBRANCH', 'PREDDEG', 'HIGHDEG', 'CONTROL', 'ST_FIPS', 'REGION', 'LOCALE', 'LOCALE2', 'CCBASIC', 'CCUGPROF', 'CCSIZSET', 'HBCU', 'PBI', 'ANNHI', 'TRIBAL',...RT2', 'CIP54ASSOC', 'CIP54CERT4', 'CIP54BACHL', 'DISTANCEONLY', 'ICLEVEL', 'OPENADMP', 'ACCREDCODE'],
       dataframe_output=False, dummy_na=True, fill_value=0.0)


Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0)


MinMaxScaler(copy=False, feature_range=(0, 1))


HyperbandSearchCV(cost_parameter_max={'n_epochs': 50},
         cost_parameter_min={'n_epochs': 5}, cv=None, error_score='raise',
         estimator=MLPClassifier(activation=<function relu at 0x7f28a5746510>, batch_size=64,
       hidden_units=(256,), init_scale=0.1, keep_prob=1.0, n_epochs=5,
       random_state=None,
       solver=<class 'tensorflow.python.training.adam.AdamOptimizer'>,
       solver_kwargs=None),
         eta=3, iid=True, n_jobs=1,
         param_distributions={'keep_prob': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f28b44a9400>, 'hidden_units': [(), (16,), (32,), (64,), (64, 64), (64, 64, 64), (128,), (128, 128), (128, 128, 128), (256,), (256, 256), (256, 256, 256), (512, 256, 128, 64), (1024, 512, 256, 128)], 'solver_k...rning_rate': 0.002}, {'learning_rate': 0.005}, {'learning_rate': 0.008}, {'learning_rate': 0.0001}]},
         pre_dispatch='2*n_jobs', random_state=42, refit=True,
         return_train_score=True, scoring=None, verbose=0)


HyperbandSearchCV essentially works like GridSearchCV. If you want to get the best estimator without all of the extra CV information, you can access it using the best_estimator_ attribute.


In [32]:
train_mlp.estimator.steps[3][1].best_estimator_


Out[32]:
MLPClassifier(activation=<function relu at 0x7f28a5746510>, batch_size=64,
       hidden_units=(128, 128), init_scale=0.1,
       keep_prob=0.83244264080042174, n_epochs=45, random_state=None,
       solver=<class 'tensorflow.python.training.adam.AdamOptimizer'>,
       solver_kwargs={'learning_rate': 0.002})

To see how well the best model performed, you can look at the best_score_.


In [33]:
train_mlp.estimator.steps[3][1].best_score_


Out[33]:
0.94616397760948301

And to look at information about the different hyperparameter configurations that were tried, you can look at the cv_results_.


In [34]:
train_mlp.estimator.steps[3][1].cv_results_


Out[34]:
{'mean_fit_time': array([  5.71521004,   9.87880683,   7.02491919,   2.49734783,
          2.04555511,   3.0459307 ,   1.41299955,   1.03468744,
          8.28476421,  13.8823324 ,  17.15766454,   5.7730906 ,
          6.91940331,   5.92865777,  55.33232911,  13.65520374,
         49.46581841,   7.73342903,  10.29447095,   2.70951978,
         17.35557111,  33.21902045]),
 'mean_score_time': array([ 0.12489303,  0.25389655,  0.11093688,  0.08840664,  0.0935638 ,
         0.11657325,  0.07519325,  0.06182806,  0.2851553 ,  0.15998785,
         0.2072041 ,  0.09375119,  0.11130897,  0.1001962 ,  0.22202452,
         0.09535344,  0.1758012 ,  0.1068356 ,  0.13764652,  0.05892269,
         0.10381524,  0.14680648]),
 'mean_test_score': array([ 0.89924267,  0.45702996,  0.45702996,  0.87471189,  0.45702996,
         0.91307211,  0.72275272,  0.88508396,  0.55021403,  0.45702996,
         0.88162661,  0.93233454,  0.54050049,  0.74349687,  0.91126111,
         0.92920645,  0.45702996,  0.92113928,  0.94023708,  0.90434639,
         0.94106026,  0.94616398]),
 'mean_train_score': array([ 0.9234465 ,  0.45702996,  0.45702996,  0.88417887,  0.45702996,
         0.93669558,  0.73147083,  0.89315006,  0.55242091,  0.45702996,
         0.8957862 ,  0.97448112,  0.54061033,  0.7705279 ,  0.92862888,
         0.96460401,  0.45702996,  0.95423238,  0.97612758,  0.92509068,
         0.98781759,  0.98798199]),
 'param_hidden_units': masked_array(data = [(128,) (512, 256, 128, 64) (128,) (64, 64) (32,) (128, 128) (16,) ()
  (512, 256, 128, 64) (256, 256) (256, 256, 256) (32,) (64, 64) (64,) (256,)
  (16,) (256, 256, 256) (128,) (128, 128) () (32,) (128, 128)],
              mask = [False False False False False False False False False False False False
  False False False False False False False False False False],
        fill_value = ?),
 'param_keep_prob': masked_array(data = [0.79654298686023284 0.59685015794648699 0.058083612168199461
  0.6011150117432088 0.020584494295802447 0.83244264080042174
  0.18182496720710062 0.30424224295953772 0.43194501864211576
  0.61185289472237947 0.51423443841361161 0.85994040673632055
  0.45049925196954299 0.94220175568485276 0.30461376917337069
  0.68423302651215689 0.49517691011127019 0.79654298686023284
  0.83244264080042174 0.30424224295953772 0.85994040673632055
  0.83244264080042174],
              mask = [False False False False False False False False False False False False
  False False False False False False False False False False],
        fill_value = ?),
 'param_n_epochs': masked_array(data = [5 5 5 5 5 5 5 5 5 16 16 16 16 16 50 50 50 15 15 15 48 45],
              mask = [False False False False False False False False False False False False
  False False False False False False False False False False],
        fill_value = ?),
 'param_solver_kwargs': masked_array(data = [{'learning_rate': 0.008} {'learning_rate': 0.05} {'learning_rate': 0.008}
  {'learning_rate': 0.008} {'learning_rate': 0.02} {'learning_rate': 0.002}
  {'learning_rate': 0.001} {'learning_rate': 0.002} {'learning_rate': 0.01}
  {'learning_rate': 0.05} {'learning_rate': 0.0001} {'learning_rate': 0.005}
  {'learning_rate': 0.02} {'learning_rate': 0.02} {'learning_rate': 0.001}
  {'learning_rate': 0.005} {'learning_rate': 0.05} {'learning_rate': 0.008}
  {'learning_rate': 0.002} {'learning_rate': 0.002} {'learning_rate': 0.005}
  {'learning_rate': 0.002}],
              mask = [False False False False False False False False False False False False
  False False False False False False False False False False],
        fill_value = ?),
 'params': ({'hidden_units': (128,),
   'keep_prob': 0.79654298686023284,
   'n_epochs': 5,
   'solver_kwargs': {'learning_rate': 0.008}},
  {'hidden_units': (512, 256, 128, 64),
   'keep_prob': 0.59685015794648699,
   'n_epochs': 5,
   'solver_kwargs': {'learning_rate': 0.05}},
  {'hidden_units': (128,),
   'keep_prob': 0.058083612168199461,
   'n_epochs': 5,
   'solver_kwargs': {'learning_rate': 0.008}},
  {'hidden_units': (64, 64),
   'keep_prob': 0.6011150117432088,
   'n_epochs': 5,
   'solver_kwargs': {'learning_rate': 0.008}},
  {'hidden_units': (32,),
   'keep_prob': 0.020584494295802447,
   'n_epochs': 5,
   'solver_kwargs': {'learning_rate': 0.02}},
  {'hidden_units': (128, 128),
   'keep_prob': 0.83244264080042174,
   'n_epochs': 5,
   'solver_kwargs': {'learning_rate': 0.002}},
  {'hidden_units': (16,),
   'keep_prob': 0.18182496720710062,
   'n_epochs': 5,
   'solver_kwargs': {'learning_rate': 0.001}},
  {'hidden_units': (),
   'keep_prob': 0.30424224295953772,
   'n_epochs': 5,
   'solver_kwargs': {'learning_rate': 0.002}},
  {'hidden_units': (512, 256, 128, 64),
   'keep_prob': 0.43194501864211576,
   'n_epochs': 5,
   'solver_kwargs': {'learning_rate': 0.01}},
  {'hidden_units': (256, 256),
   'keep_prob': 0.61185289472237947,
   'n_epochs': 16,
   'solver_kwargs': {'learning_rate': 0.05}},
  {'hidden_units': (256, 256, 256),
   'keep_prob': 0.51423443841361161,
   'n_epochs': 16,
   'solver_kwargs': {'learning_rate': 0.0001}},
  {'hidden_units': (32,),
   'keep_prob': 0.85994040673632055,
   'n_epochs': 16,
   'solver_kwargs': {'learning_rate': 0.005}},
  {'hidden_units': (64, 64),
   'keep_prob': 0.45049925196954299,
   'n_epochs': 16,
   'solver_kwargs': {'learning_rate': 0.02}},
  {'hidden_units': (64,),
   'keep_prob': 0.94220175568485276,
   'n_epochs': 16,
   'solver_kwargs': {'learning_rate': 0.02}},
  {'hidden_units': (256,),
   'keep_prob': 0.30461376917337069,
   'n_epochs': 50,
   'solver_kwargs': {'learning_rate': 0.001}},
  {'hidden_units': (16,),
   'keep_prob': 0.68423302651215689,
   'n_epochs': 50,
   'solver_kwargs': {'learning_rate': 0.005}},
  {'hidden_units': (256, 256, 256),
   'keep_prob': 0.49517691011127019,
   'n_epochs': 50,
   'solver_kwargs': {'learning_rate': 0.05}},
  {'hidden_units': (128,),
   'keep_prob': 0.79654298686023284,
   'n_epochs': 15,
   'solver_kwargs': {'learning_rate': 0.008}},
  {'hidden_units': (128, 128),
   'keep_prob': 0.83244264080042174,
   'n_epochs': 15,
   'solver_kwargs': {'learning_rate': 0.002}},
  {'hidden_units': (),
   'keep_prob': 0.30424224295953772,
   'n_epochs': 15,
   'solver_kwargs': {'learning_rate': 0.002}},
  {'hidden_units': (32,),
   'keep_prob': 0.85994040673632055,
   'n_epochs': 48,
   'solver_kwargs': {'learning_rate': 0.005}},
  {'hidden_units': (128, 128),
   'keep_prob': 0.83244264080042174,
   'n_epochs': 45,
   'solver_kwargs': {'learning_rate': 0.002}}),
 'rank_test_score': array([10, 18, 18, 13, 18,  7, 15, 11, 16, 18, 12,  4, 17, 14,  8,  5, 18,
         6,  3,  9,  2,  1], dtype=int32),
 'split0_test_score': array([ 0.91461007,  0.45705824,  0.45705824,  0.88302073,  0.45705824,
         0.90819348,  0.67966436,  0.8810464 ,  0.45705824,  0.45705824,
         0.8810464 ,  0.93188549,  0.70730503,  0.45705824,  0.90720632,
         0.93435341,  0.45705824,  0.93089832,  0.94521224,  0.90967423,
         0.94718657,  0.95162883]),
 'split0_train_score': array([ 0.9375    ,  0.45701581,  0.45701581,  0.88661067,  0.45701581,
         0.92564229,  0.68527668,  0.88661067,  0.45701581,  0.45701581,
         0.90118577,  0.97282609,  0.70775692,  0.45701581,  0.92045455,
         0.96936759,  0.45701581,  0.96170949,  0.97504941,  0.92588933,
         0.99184783,  0.99061265]),
 'split1_test_score': array([ 0.90316206,  0.45701581,  0.45701581,  0.87401186,  0.45701581,
         0.92094862,  0.71492095,  0.87994071,  0.62450593,  0.45701581,
         0.87401186,  0.93527668,  0.45701581,  0.87302372,  0.90810277,
         0.93132411,  0.45701581,  0.92539526,  0.93181818,  0.90513834,
         0.93527668,  0.94021739]),
 'split1_train_score': array([ 0.93432099,  0.45703704,  0.45703704,  0.8854321 ,  0.45703704,
         0.94864198,  0.73728395,  0.88987654,  0.63901235,  0.45703704,
         0.89061728,  0.97876543,  0.45703704,  0.90740741,  0.93283951,
         0.96962963,  0.45703704,  0.96345679,  0.97382716,  0.93432099,
         0.98691358,  0.98691358]),
 'split2_test_score': array([ 0.87994071,  0.45701581,  0.45701581,  0.86709486,  0.45701581,
         0.91007905,  0.77371542,  0.89426877,  0.56916996,  0.45701581,
         0.88982213,  0.9298419 ,  0.45701581,  0.9006917 ,  0.91847826,
         0.92193676,  0.45701581,  0.90711462,  0.94367589,  0.89822134,
         0.94071146,  0.94664032]),
 'split2_train_score': array([ 0.89851852,  0.45703704,  0.45703704,  0.88049383,  0.45703704,
         0.93580247,  0.77185185,  0.90296296,  0.56123457,  0.45703704,
         0.89555556,  0.97185185,  0.45703704,  0.94716049,  0.93259259,
         0.95481481,  0.45703704,  0.93753086,  0.97950617,  0.91506173,
         0.98469136,  0.98641975]),
 'std_fit_time': array([ 1.37948502,  2.05526639,  3.2464671 ,  0.50829486,  0.49376665,
         0.16138088,  0.01638957,  0.03163927,  0.37672038,  0.1275633 ,
         1.76849646,  0.66819983,  1.84264573,  0.64170712,  8.03222655,
         2.66716171,  2.94213638,  0.03909851,  1.61943898,  0.02392876,
         2.14579761,  6.55491198]),
 'std_score_time': array([ 0.0303808 ,  0.0035742 ,  0.01347911,  0.00391845,  0.00073238,
         0.00871074,  0.00049139,  0.00370175,  0.02670011,  0.0021613 ,
         0.02528999,  0.001849  ,  0.03923516,  0.01926425,  0.05806998,
         0.02840535,  0.00227157,  0.00641461,  0.03439537,  0.00013065,
         0.01539214,  0.05509931]),
 'std_test_score': array([  1.44234985e-02,   2.00061944e-05,   2.00061944e-05,
          6.52104784e-03,   2.00061944e-05,   5.62112199e-03,
          3.87964203e-02,   6.50871187e-03,   6.96668104e-02,
          2.00061944e-05,   6.46649654e-03,   2.24100711e-03,
          1.18006882e-01,   2.02957201e-01,   5.11514515e-03,
          5.28591261e-03,   2.00061944e-05,   1.01658774e-02,
          5.98455096e-03,   4.70940337e-03,   4.86884145e-03,
          4.67123452e-03]),
 'std_train_score': array([  1.76744600e-02,   1.00063908e-05,   1.00063908e-05,
          2.64976626e-03,   1.00063908e-05,   9.41079474e-03,
          3.55823871e-02,   7.06570527e-03,   7.45606919e-02,
          1.00063908e-05,   4.31764806e-03,   3.05546044e-03,
          1.18190485e-01,   2.22279781e-01,   5.78100728e-03,
          6.92283370e-03,   1.00063908e-05,   1.18312790e-02,
          2.44057891e-03,   7.88281440e-03,   2.99072806e-03,
          1.87104661e-03])}

Just like any other model in CivisML, we can use hyperband-tuned models to make predictions using .predict() on the ModelPipeline.


In [35]:
predict_mlp = model_mlp.predict(test_data)

In [36]:
predict_mlp.table.head()


Out[36]:
control_1 control_2 control_3
UNITID
217882 9.999834e-01 0.000016 4.727007e-07
195234 1.779818e-03 0.996192 2.028217e-03
446385 8.158081e-07 0.005291 9.947079e-01
13508115 1.671655e-02 0.972799 1.048439e-02
459499 4.405403e-03 0.035383 9.602115e-01

It looks like this model is predicting the same categories as the models we tried earlier, so we can feel very confident about those predictions.

We're excited to see what problems you solve with these new capabilities. If you have any problems or questions, contact us at support@civisanalytics.com. Happy modeling!