Prediction with PyLogit

The purpose of this notebook is to demonstrate how one can use PyLogit to predict the probabilities of discrete outcomes. The two main use cases are when one has

  1. already estimated a model using PyLogit and wants to use the estimated parameters to predict probabilities for a different dataset
  2. one or more sets of parameters (obtained from some process other than model estimation with PyLogit) and one wants to use those parameters to predict probabilities.

Both of these uses will be demonstrated below. Essentially, the "predict" function of a model object is the place to go to perform both of these tasks. The exception to this is for mixed logit models, where in addition to the predict function, there is also a "panel predict" function that allows one to condition on past observations for individuals in one's prediction dataset that were also in one's estimation dataset. For more information on the panel_predict method, see chapter 11 ("Individual-Level Parameters") of Discrete Choice Methods with Simulation by Kenneth Train (2009).

The examples below should make all of this clear.

The dataset being used in this notebook is the "Swissmetro" dataset used in the Python Biogeme examples. The data can be downloaded at http://biogeme.epfl.ch/examples_swissmetro.html, and a detailed explanation of the variables and data-collection procedure can be found at http://www.strc.ch/conferences/2001/bierlaire1.pdf.


In [1]:
# For recording the model specification 
from collections import OrderedDict

# For making plots pretty
import seaborn
# For file input/output
import pandas as pd
# For vectorized math operations
import numpy as np
# For plotting
import matplotlib.pyplot as plt

# For model estimation and prediction
import pylogit as pl

# To display plots inline
%matplotlib inline


/Users/timothyb0912/anaconda/lib/python2.7/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)

Load the dataset

Note that because we want to demonstrate the use of the dataset for prediction, we will randomly remove 2,000 of the response in the dataset. After estimating the various models, we will then predict the probabilities for these held-out responses.


In [2]:
# Load the dataset
raw_data = pd.read_csv("../data/long_swiss_metro_data.csv")

# Get a list of the choice situation ids that will be used
# in estimation and in prediction
all_situation_ids = np.sort(raw_data["custom_id"].unique())

# Set a random seed for reproducibility
np.random.seed(61)

# Shuffle and split the ids
np.random.shuffle(all_situation_ids)
prediction_ids = all_situation_ids[:2000]
estimation_ids = all_situation_ids[2000:]

# Create the estimation and prediction datasets
estimation_df = raw_data.loc[raw_data["custom_id"].isin(estimation_ids)].copy()
prediction_df = raw_data.loc[raw_data["custom_id"].isin(prediction_ids)].copy()

In [3]:
# Look at the first 5 rows of the data
# Note that for the mode_id, 1-Train, 2-SwissMetro, 3-Car
estimation_df.head().T


Out[3]:
3 4 5 6 7
custom_id 2.000000 2.000000 2.00 3.000000 3.000000
mode_id 1.000000 2.000000 3.00 1.000000 2.000000
CHOICE 0.000000 1.000000 0.00 0.000000 1.000000
GROUP 2.000000 2.000000 2.00 2.000000 2.000000
SURVEY 0.000000 0.000000 0.00 0.000000 0.000000
SP 1.000000 1.000000 1.00 1.000000 1.000000
ID 1.000000 1.000000 1.00 1.000000 1.000000
PURPOSE 1.000000 1.000000 1.00 1.000000 1.000000
FIRST 0.000000 0.000000 0.00 0.000000 0.000000
TICKET 1.000000 1.000000 1.00 1.000000 1.000000
WHO 1.000000 1.000000 1.00 1.000000 1.000000
LUGGAGE 0.000000 0.000000 0.00 0.000000 0.000000
AGE 3.000000 3.000000 3.00 3.000000 3.000000
MALE 0.000000 0.000000 0.00 0.000000 0.000000
INCOME 2.000000 2.000000 2.00 2.000000 2.000000
GA 0.000000 0.000000 0.00 0.000000 0.000000
ORIGIN 2.000000 2.000000 2.00 2.000000 2.000000
DEST 1.000000 1.000000 1.00 1.000000 1.000000
seat_configuration 0.000000 0.000000 0.00 0.000000 0.000000
travel_time 103.000000 60.000000 117.00 130.000000 67.000000
headway 30.000000 10.000000 0.00 60.000000 30.000000
travel_cost 48.000000 49.000000 84.00 48.000000 58.000000
travel_time_hrs 1.716667 1.000000 1.95 2.166667 1.116667
headway_hrs 0.500000 0.166667 0.00 1.000000 0.500000
free_ticket 0.000000 0.000000 0.00 0.000000 0.000000
travel_cost_hundreth 0.480000 0.490000 0.84 0.480000 0.580000
single_luggage_piece 0.000000 0.000000 0.00 0.000000 0.000000
multiple_luggage_pieces 0.000000 0.000000 0.00 0.000000 0.000000
regular_class 1.000000 1.000000 1.00 1.000000 1.000000
train_survey 1.000000 1.000000 1.00 1.000000 1.000000

Create the model specification

As in previous examples, the model specification being used is $$ \begin{aligned} V_{i, \textrm{Train}} &= \textrm{ASC Train} + \\ &\quad \beta _{ \textrm{tt_transit} } \textrm{Travel Time} _{ \textrm{Train}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{tc_train} } \textrm{Travel Cost}_{\textrm{Train}} * \left( GA == 0 \right) * 0.01 + \\ &\quad \beta _{ \textrm{headway_train} } \textrm{Headway} _{\textrm{Train}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{survey} } \left( \textrm{Train Survey} == 1 \right) \\ \\ V_{i, \textrm{Swissmetro}} &= \textrm{ASC Swissmetro} + \\ &\quad \beta _{ \textrm{tt_transit} } \textrm{Travel Time} _{ \textrm{Swissmetro}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{tc_sm} } \textrm{Travel Cost}_{\textrm{Swissmetro}} * \left( GA == 0 \right) * 0.01 + \\ &\quad \beta _{ \textrm{headway_sm} } \textrm{Heaway} _{\textrm{Swissmetro}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{seat} } \left( \textrm{Seat Configuration} == 1 \right) \\ &\quad \beta _{ \textrm{survey} } \left( \textrm{Train Survey} == 1 \right) \\ &\quad \beta _{ \textrm{first_class} } \left( \textrm{First Class} == 0 \right) \\ \\ V_{i, \textrm{Car}} &= \beta _{ \textrm{tt_car} } \textrm{Travel Time} _{ \textrm{Car}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{tc_car}} \textrm{Travel Cost}_{\textrm{Car}} * 0.01 + \\ &\quad \beta _{\textrm{luggage}=1} \left( \textrm{Luggage} == 1 \right) + \\ &\quad \beta _{\textrm{luggage}>1} \left( \textrm{Luggage} > 1 \right) \end{aligned} $$ Here, $i$ is used to denote a particular individual.


In [4]:
# NOTE: - Specification and variable names must be ordered dictionaries.
#       - Keys should be variables within the long format dataframe.
#         The sole exception to this is the "intercept" key.
#       - For the specification dictionary, the values should be lists
#         of integers or or lists of lists of integers. Within a list, 
#         or within the inner-most list, the integers should be the 
#         alternative ID's of the alternative whose utility specification 
#         the explanatory variable is entering. Lists of lists denote 
#         alternatives that will share a common coefficient for the variable
#         in question.

basic_specification = OrderedDict()
basic_names = OrderedDict()

basic_specification["intercept"] = [1, 2]
basic_names["intercept"] = ['ASC Train',
                            'ASC Swissmetro']

basic_specification["travel_time_hrs"] = [[1, 2,], 3]
basic_names["travel_time_hrs"] = ['Travel Time, units:hrs (Train and Swissmetro)',
                                  'Travel Time, units:hrs (Car)']

basic_specification["travel_cost_hundreth"] = [1, 2, 3]
basic_names["travel_cost_hundreth"] = ['Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Train)',
                                       'Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Swissmetro)',
                                       'Travel Cost, units: 0.01 CHF (Car)']

basic_specification["headway_hrs"] = [1, 2]
basic_names["headway_hrs"] = ["Headway, units:hrs, (Train)",
                              "Headway, units:hrs, (Swissmetro)"]

basic_specification["seat_configuration"] = [2]
basic_names["seat_configuration"] = ['Airline Seat Configuration, base=No (Swissmetro)']

basic_specification["train_survey"] = [[1, 2]]
basic_names["train_survey"] = ["Surveyed on a Train, base=No, (Train and Swissmetro)"]

basic_specification["regular_class"] = [1]
basic_names["regular_class"] = ["First Class == False, (Swissmetro)"]

basic_specification["single_luggage_piece"] = [3]
basic_names["single_luggage_piece"] = ["Number of Luggage Pieces == 1, (Car)"]

basic_specification["multiple_luggage_pieces"] = [3]
basic_names["multiple_luggage_pieces"] = ["Number of Luggage Pieces > 1, (Car)"]

Estimate the various models to be used for prediction

Note that we will first estimate all models that can be estimated with PyLogit. Then we will demonstrate the use of the prediction function for these models.


In [5]:
# The 'alt_id_column' is the name of a column to be created in the long-format data
# It will identify the alternative associated with each row.
alt_id_column = "mode_id"

# The "obs_id_column" is a custom id column that ignores the fact that this is a 
# panel/repeated-observations dataset. This column denotes each individual choice
# situation.
obs_id_column = "custom_id"

# The "choice_column" records the name of the column that denotes whether or not each
# individual chose the alternative on a given row.
choice_column = "CHOICE"

Estimate the MNL model


In [6]:
# Estimate the multinomial logit model (MNL)
swissmetro_mnl = pl.create_choice_model(data=estimation_df,
                                        alt_id_col=alt_id_column,
                                        obs_id_col=obs_id_column,
                                        choice_col=choice_column,
                                        specification=basic_specification,
                                        model_type="MNL",
                                        names=basic_names)

# Specify the initial values and method for the optimization.
swissmetro_mnl.fit_mle(np.zeros(14))


Log-likelihood at zero: -4,906.5129
Initial Log-likelihood: -4,906.5129
Estimation Time: 0.15 seconds.
Final log-likelihood: -3,668.2772
/Users/timothyb0912/anaconda/lib/python2.7/site-packages/scipy/optimize/_minimize.py:382: RuntimeWarning: Method BFGS does not use Hessian information (hess).
  RuntimeWarning)

Estimate the Asymmetric Logit Model


In [7]:
# Create the various specification and name dictionaries 
# for the asymmetric logit model. It is the same as for the MNL
# except that it excludes the intercept from V_{i, alternative}.
asym_specification = OrderedDict()
asym_names = OrderedDict()

for col in basic_specification:
    if col != "intercept":
        asym_specification[col] = basic_specification[col]
        asym_names[col] = basic_names[col]

# Get the list of intercept names for the asymmetric logit model
# This is used to tell PyLogit that we will be using 'outside'
# intercept parameters.
# See equation 2 of http://arxiv.org/abs/1606.05900 for more
# details
asym_intercept_names = basic_names["intercept"]
# Specify what alternative is not having its intercept estimated
asym_intercept_ref_pos = 2

# Give names to the shape parameters of the asymmetric logit model
# Note that all of the shape parameters are not identifiable so we
# need to restrict one of them. This accounts for why we do not have
# a "shape_car" name.
asym_shape_names = ["shape_train", "shape_swiss_metro"]

# Note the index of the alternative whose shape parameter is constrained
# (i.e. the Car alternative)
asym_ref = 2

# Create the asymmetric logit's model object
swissmetro_asym = pl.create_choice_model(data=estimation_df,
                                         alt_id_col=alt_id_column,
                                         obs_id_col=obs_id_column,
                                         choice_col=choice_column,
                                         specification=asym_specification,
                                         model_type="Asym",
                                         names=asym_names,
                                         shape_names=asym_shape_names,
                                         intercept_names=asym_intercept_names,
                                         shape_ref_pos=asym_ref,
                                         intercept_ref_pos=asym_intercept_ref_pos)

# Note that, below, we use None for initial values and use kwargs to
# specify our initial values for the optimization. This is not
# necessary, but it is convenient and an option.

# Note that dividing the index coefficients by log(J), where J is
# the number of alternatives in the dataset, accounts for the fact
# that when each shape parameter is 1/J, the value of the asymmetric
# logit model's estimated index coefficients are equal to the logit
# model's estimates, divided by log(J).
# Again, see http://arxiv.org/abs/1606.05900 for more details.

swissmetro_asym.fit_mle(None,
                        init_shapes=np.zeros(2),
                        init_intercepts=swissmetro_mnl.params.values[:2],
                        init_coefs=swissmetro_mnl.params.values[2:] / np.log(3))


Log-likelihood at zero: -4,906.5129
Initial Log-likelihood: -3,668.2772
Estimation Time: 0.42 seconds.
Final log-likelihood: -3,533.2248

Estimate the Uneven Logit Model


In [8]:
# Create the various specification and name dictionaries 
# for the uneven logit model. It is the same as for the MNL
# except that it excludes the intercept from V_{i, alternative}.
uneven_specification = OrderedDict()
uneven_names = OrderedDict()

for col in basic_specification:
    if col != "intercept":
        uneven_specification[col] = basic_specification[col]
        uneven_names[col] = basic_names[col]

# Get the list of intercept names for the uneven logit model
# This is used to tell PyLogit that we will be using 'outside'
# intercept parameters.
# See equation 2 of http://arxiv.org/abs/1606.05900 for more
# details
uneven_intercept_names = basic_names["intercept"]
# Specify what alternative is not having its intercept estimated
uneven_intercept_ref_pos = 2

# Specify the names of the uneven logit model's shape parameters
# Note that we include "shape_car" because all of the uneven logit
# model's shape parameters are identifiable.
uneven_shape_names = ["shape_train", "shape_swiss_metro", "shape_car"]

swissmetro_uneven = pl.create_choice_model(data=estimation_df,
                                           alt_id_col=alt_id_column,
                                           obs_id_col=obs_id_column,
                                           choice_col=choice_column,
                                           specification=uneven_specification,
                                           model_type="Uneven",
                                           names=uneven_names,
                                           shape_names=uneven_shape_names,
                                           intercept_names=uneven_intercept_names,
                                           intercept_ref_pos=uneven_intercept_ref_pos)


# Also, note that we use None for initial values and use kwargs to
# specify our initial values for the optimaztion. This is necessary
# to use 'outside' intercept parameters with the model.
swissmetro_uneven.fit_mle(None,
                          init_shapes=np.zeros(3),
                          init_intercepts=swissmetro_mnl.params.values[:2],
                          init_coefs=swissmetro_mnl.params.values[2:])


Log-likelihood at zero: -4,906.5129
Initial Log-likelihood: -3,668.2772
Estimation Time: 0.31 seconds.
Final log-likelihood: -3,511.3976

Estimate the Scobit Model


In [9]:
# Create the various specification and name dictionaries 
# for the scobit model. It is the same as for the MNL
# except that it excludes the intercept from V_{i, alternative}.
scobit_specification = OrderedDict()
scobit_names = OrderedDict()

for col in basic_specification:
    if col != "intercept":
        scobit_specification[col] = basic_specification[col]
        scobit_names[col] = basic_names[col]

# Get the list of intercept names for the scobit model
# This is used to tell PyLogit that we will be using 'outside'
# intercept parameters.
# See equation 2 of http://arxiv.org/abs/1606.05900 for more
# details
scobit_intercept_names = basic_names["intercept"]

# Create the names of the shape parameters that are needed for the scobit model
scobit_shape_names = ["shape_train", "shape_swiss_metro", "shape_car"]

# Specify which intercept/ASC is not being estimated (namely, the Car intercept)
scobit_intercept_ref = 2

swissmetro_scobit = pl.create_choice_model(data=estimation_df,
                                           alt_id_col=alt_id_column,
                                           obs_id_col=obs_id_column,
                                           choice_col=choice_column,
                                           specification=scobit_specification,
                                           model_type="Scobit",
                                           names=scobit_names,
                                           shape_names=scobit_shape_names,
                                           intercept_ref_pos=scobit_intercept_ref,
                                           intercept_names=scobit_intercept_names)

# Note that we are using 'outside' intercept parameters for this model
swissmetro_scobit.fit_mle(None,
                          init_shapes=np.zeros(3),
                          init_intercepts=swissmetro_mnl.params.values[:2],
                          init_coefs=swissmetro_mnl.params.values[2:])


Log-likelihood at zero: -4,906.5129
Initial Log-likelihood: -3,668.2772
Estimation Time: 0.50 seconds.
Final log-likelihood: -3,505.9938

Estimate the Clog-Log Model


In [10]:
# Create the specification and name dictionaries 
# for the clog-log model.It is the same as for the MNL
# except that it excludes the intercept from V_{i, alternative}.
clog_specification = OrderedDict()
clog_names = OrderedDict()

# Copy the specification dictionary from the logit
# model, without the intercept parameters so we
# can place the intercept parameters outside the
# index
for col in basic_specification:
    if col != "intercept":
        clog_specification[col] = basic_specification[col]
        clog_names[col] = basic_names[col]

# Get the list of intercept names for the clog-log model
# This is used to tell PyLogit that we will be using 'outside'
# intercept parameters.
# See equation 2 of http://arxiv.org/abs/1606.05900 for more
# details
clog_intercept_names = basic_names["intercept"]

# Specify which intercept/ASC is not being estimated 
# (i.e, the Car intercept)
clog_intercept_ref = 2

# Create the Clog-log model object
swissmetro_clog = pl.create_choice_model(data=estimation_df,
                                         alt_id_col=alt_id_column,
                                         obs_id_col=obs_id_column,
                                         choice_col=choice_column,
                                         specification=clog_specification,
                                         model_type="Cloglog",
                                         intercept_ref_pos=clog_intercept_ref,
                                         names=clog_names,
                                         intercept_names=clog_intercept_names)

# Estimate the clog log model. Note we don't pass one single array of
# initial values but instead pass keyword arguments
swissmetro_clog.fit_mle(None,
                        init_intercepts=swissmetro_mnl.params.values[:2],
                        init_coefs=swissmetro_mnl.params.values[2:])


Log-likelihood at zero: -4,906.5129
Initial Log-likelihood: -4,249.4904
Estimation Time: 0.41 seconds.
Final log-likelihood: -3,680.1916

Estimate a Nested Logit Model

For the purposes of the example we'll simply nest the "existing" modes together and leave the future mode (the SwissMetro) in its own nest.


In [11]:
# Specify the nesting specification.
# I.e. specify the name of the various nests,
# and which alternatives belong to which nests.
nest_membership = OrderedDict()
nest_membership["Future Modes"] = [2]
nest_membership["Existing Modes"] = [1, 3]

# Create the nested logit model object
swissmetro_nested = pl.create_choice_model(data=estimation_df,
                                           alt_id_col=alt_id_column,
                                           obs_id_col=obs_id_column,
                                           choice_col=choice_column,
                                           specification=basic_specification,
                                           model_type="Nested Logit",
                                           names=basic_names,
                                           nest_spec=nest_membership)

# Create the initial values for the nest parameters
# Note: This should be in terms of the reparameterized values
# (i.e. the logit of the inverse of the scale parameter) used
# by PyLogit.
# Also, note that 40 corresponds to a scale parameter that is
# essentially equal to 1, and 5 corresponds to a scale
# parameter just slightly greater than 1.
init_nests = np.array([40, 5])

# Create the initial values for the estimation
initial_values = np.concatenate([init_nests,
                                 swissmetro_mnl.params.values])

# Estimate the nested logit model
# Note we constrain the nesting parameter for nest 1 since
# the Future Modes nest only has 1 alternative in it.
swissmetro_nested.fit_mle(initial_values,
                          constrained_pos=[0])

swissmetro_nested.get_statsmodels_summary()


Log-likelihood at zero: -4,594.2735
Initial Log-likelihood: -3,666.8367
Estimation Time: 0.49 seconds.
Final log-likelihood: -3,575.2522
Out[11]:
Nested Logit Model Regression Results
Dep. Variable: CHOICE No. Observations: 4,768
Model: Nested Logit Model Df Residuals: 4,752
Method: MLE Df Model: 16
Date: Thu, 17 Nov 2016 Pseudo R-squ.: 0.222
Time: 23:52:23 Pseudo R-bar-squ.: 0.218
converged: False Log-Likelihood: -3,575.252
LL-Null: -4,594.273
coef std err z P>|z| [95.0% Conf. Int.]
Future Modes 40.0000 nan nan nan nan nan
Existing Modes -0.4513 0.121 -3.728 0.000 -0.689 -0.214
ASC Train -0.6825 0.115 -5.924 0.000 -0.908 -0.457
ASC Swissmetro -0.1837 0.113 -1.619 0.105 -0.406 0.039
Travel Time, units:hrs (Train and Swissmetro) -0.4452 0.026 -17.129 0.000 -0.496 -0.394
Travel Time, units:hrs (Car) -0.3540 0.025 -14.339 0.000 -0.402 -0.306
Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Train) -0.1932 0.055 -3.505 0.000 -0.301 -0.085
Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Swissmetro) -0.1713 0.049 -3.474 0.001 -0.268 -0.075
Travel Cost, units: 0.01 CHF (Car) -0.5397 0.062 -8.742 0.000 -0.661 -0.419
Headway, units:hrs, (Train) -0.2679 0.050 -5.375 0.000 -0.366 -0.170
Headway, units:hrs, (Swissmetro) -0.4575 0.225 -2.031 0.042 -0.899 -0.016
Airline Seat Configuration, base=No (Swissmetro) -0.3986 0.103 -3.887 0.000 -0.600 -0.198
Surveyed on a Train, base=No, (Train and Swissmetro) 1.4522 0.112 12.994 0.000 1.233 1.671
First Class == False, (Swissmetro) 0.2518 0.061 4.101 0.000 0.131 0.372
Number of Luggage Pieces == 1, (Car) 0.1595 0.058 2.759 0.006 0.046 0.273
Number of Luggage Pieces > 1, (Car) 0.7496 0.167 4.495 0.000 0.423 1.076

Estimate a Mixed Logit Model

Here, we'll estimate a relatively simple model, allowing for normally distributed sensitivities to travel time and travel cost. The distribution will be across the various individuals in the dataset(note, not across the choice situations of a given individual). The individuals are denoted by the "ID" column.

Given that we're estimating a distribution for sensitivities to time and cost, it makes sense to use a distribution with bounded support like a log-normal distribution. In this way, we could restrict the sensitivities to only being negative. However, Pylogit does not (yet!) support distributions other than the normal distribution for mixed logit.


In [12]:
# Specify the parameters, by name, that are being treated
# as random variables
mixing_variables = (basic_names["travel_time_hrs"] +
                    basic_names["travel_cost_hundreth"])

# Create the mixed logit model object
swissmetro_mixed = pl.create_choice_model(data=estimation_df,
                                          alt_id_col=alt_id_column,
                                          obs_id_col=obs_id_column,
                                          choice_col=choice_column,
                                          specification=basic_specification,
                                          model_type="Mixed Logit",
                                          names=basic_names,
                                          mixing_id_col="ID",
                                          mixing_vars=mixing_variables)

# Create the initial values for estimation.
# Note that we need one value for each beta
# coefficient and for each of the standard deviations
# that will now be estimated.
initial_values = np.concatenate([swissmetro_mnl.params.values,
                                 np.zeros(len(mixing_variables))])

# Estimate the mixed logit model
swissmetro_mixed.fit_mle(initial_values,
                         seed=26,
                         num_draws=400)

swissmetro_mixed.get_statsmodels_summary()


Log-likelihood at zero: -4,906.5129
Initial Log-likelihood: -3,668.2772
Estimation Time: 3.25 minutes.
Final log-likelihood: -2,824.8372
Out[12]:
Mixed Logit Model Regression Results
Dep. Variable: CHOICE No. Observations: 4,768
Model: Mixed Logit Model Df Residuals: 4,749
Method: MLE Df Model: 19
Date: Thu, 17 Nov 2016 Pseudo R-squ.: 0.424
Time: 23:55:44 Pseudo R-bar-squ.: 0.420
converged: False Log-Likelihood: -2,824.837
LL-Null: -4,906.513
coef std err z P>|z| [95.0% Conf. Int.]
ASC Train -2.8480 0.460 -6.190 0.000 -3.750 -1.946
ASC Swissmetro -2.9022 0.398 -7.299 0.000 -3.682 -2.123
Travel Time, units:hrs (Train and Swissmetro) -3.0662 0.178 -17.262 0.000 -3.414 -2.718
Travel Time, units:hrs (Car) -3.7078 0.169 -21.924 0.000 -4.039 -3.376
Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Train) -5.3458 0.634 -8.435 0.000 -6.588 -4.104
Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Swissmetro) -2.7045 0.330 -8.195 0.000 -3.351 -2.058
Travel Cost, units: 0.01 CHF (Car) -2.8815 0.333 -8.664 0.000 -3.533 -2.230
Headway, units:hrs, (Train) -0.5237 0.117 -4.488 0.000 -0.752 -0.295
Headway, units:hrs, (Swissmetro) -0.9966 0.356 -2.799 0.005 -1.694 -0.299
Airline Seat Configuration, base=No (Swissmetro) -0.5921 0.185 -3.195 0.001 -0.955 -0.229
Surveyed on a Train, base=No, (Train and Swissmetro) 5.8038 0.452 12.852 0.000 4.919 6.689
First Class == False, (Swissmetro) 0.9976 0.183 5.441 0.000 0.638 1.357
Number of Luggage Pieces == 1, (Car) 1.1681 0.334 3.503 0.000 0.514 1.822
Number of Luggage Pieces > 1, (Car) 2.7793 1.030 2.699 0.007 0.761 4.798
Sigma Travel Time, units:hrs (Train and Swissmetro) 2.2175 0.113 19.692 0.000 1.997 2.438
Sigma Travel Time, units:hrs (Car) 1.4464 0.108 13.426 0.000 1.235 1.658
Sigma Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Train) 2.4709 0.512 4.826 0.000 1.467 3.474
Sigma Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Swissmetro) 1.5809 0.276 5.720 0.000 1.039 2.123
Sigma Travel Cost, units: 0.01 CHF (Car) -1.8658 0.315 -5.930 0.000 -2.482 -1.249

Predictions using 1 dataset and 1 set of parameters

Here, we will simply predict the probabilities of each available alternative being chosen in each choice situation from the prediction dataset, using all of the models that we estimated above. Note that we will perform two predictions with the Mixed Logit model. One will use the standard "predict" function and the other will use the "panel_predict" function.


In [13]:
# Create an ordered dictionary mapping shortened model
# names to estimated model objects.
model_name_to_obj = OrderedDict()
model_name_to_obj["Clog"] = swissmetro_clog
model_name_to_obj["MNL"] = swissmetro_mnl
model_name_to_obj["Nested"] = swissmetro_nested
model_name_to_obj["Asym"] = swissmetro_asym
model_name_to_obj["Uneven"] = swissmetro_uneven
model_name_to_obj["Scobit"] = swissmetro_scobit
model_name_to_obj["Mixed"] = swissmetro_mixed

# Initialize a dictionary to store the log-likelihoods
# on the prediction dataset.
prediction_log_likelihoods = OrderedDict()
for key in model_name_to_obj:
    prediction_log_likelihoods[key] = None

# Calculate and store the predictions for all models besides
# the mixed logit model. The predictions will be stored on
# prediction_df
for model_name in model_name_to_obj:
    if model_name == "Mixed":
        # Don't make predictions for the Mixed Logit Model
        # using the for-loop
        continue
    else:
        # Get the model object to be used in making predictions
        model_obj = model_name_to_obj[model_name]
        
        # Note that by default, the predict method returns the
        # predicted probabilities for each available alternative
        # for each choice situation.
        prediction_array = model_obj.predict(prediction_df)
        
        # We can use the "return_long_probs" and "choice_col" 
        # keyword arguments to ensure that we only return the 
        # probability of the chosen alternative for each
        # choice situation.
        chosen_predictions = model_obj.predict(prediction_df,
                                               choice_col=choice_column,
                                               return_long_probs=False)
        
        # Calculate the predicted log-likelihood in two ways and,
        # for demonstration, ensure that they agree.
        log_likelihood_calc_1 = np.log(chosen_predictions).sum()
        
        log_predictions = np.log(prediction_array)
        choice_array = prediction_df[choice_column].values
        log_likelihood_calc_2 = choice_array.dot(log_predictions)
        
        assert np.allclose(log_likelihood_calc_1,
                           log_likelihood_calc_2)
        
        # Store the log-likelihood
        prediction_log_likelihoods[model_name] = log_likelihood_calc_1
        
        # Create a column name for the predictions
        prediction_col = model_name + "_Predictions"
        
        # Store the predicted probabilities
        prediction_df[prediction_col] = prediction_array

In [14]:
# Calculate and store the predictions for the mixed logit 
# model using both the predict and panel_predict functions.
# Note that the panel_predict function will condition the
# predictions on the estimation data of each person
# (if there is any)
for prediction_type in ["Mixed", "Mixed_Panel"]:
    if "Panel" not in prediction_type:
        # We set the random seed simply for reproducibility
        predictions = swissmetro_mixed.predict(prediction_df,
                                               seed=2,
                                               num_draws=400)
    else:
        # The arguments are ['data', 'num_draws']
        args = [prediction_df, 400]
        kwargs = {"seed": 2}

        # Note that we can change the return type the same
        # way as with the predict function. Pass
        # return_long_probs = False and 
        # choice_col = choice_column as keyword arguments
        predictions = swissmetro_mixed.panel_predict(*args,
                                                     **kwargs)

    # Create the column name for the predictions
    prediction_col = prediction_type + "_Predictions"
    # Store the predicted probabilities
    prediction_df[prediction_col] = predictions

    # Calculate the log-likelihood
    actual_choices = prediction_df[choice_column].values
    log_predicted_probs = np.log(predictions)

    prediction_log_likelihoods[prediction_type] =\
                        actual_choices.dot(log_predicted_probs)

In [15]:
# Look at the predicted log-likelihoods of the various models
pd.Series(prediction_log_likelihoods,
          name="Log-likelihood of Predictions").sort(inplace=False)


Out[15]:
Mixed         -1534.343235
Clog          -1516.255638
MNL           -1495.346433
Asym          -1462.023258
Nested        -1461.652262
Scobit        -1451.833670
Uneven        -1447.384942
Mixed_Panel    -947.548620
Name: Log-likelihood of Predictions, dtype: float64

Predictions using 1 dataset and multiple parameters.

One feature of PyLogit's predict method is that it allows users to pass sets of parameters as a keyword argument (using "param_list"). This allows one to make multiple sets of predictions at once. Such a feature is useful when one has, for instance, a sample of MCMC samples from the posterior distribution of one's model parameters. Alternatively, as we shall see below, this feature is also useful when one wants to plot the log-likelihood function.

To demonstrate how one uses PyLogit to predict probabilities when one has 1 dataset and multiple sets of coefficients, we will simply use the Uneven Logit model. Most models in PyLogit support prediction with multiple sets of coefficients, and all models that support this feature make use of the same "param_list" keyword argument in the predict method.

The two times where prediction with multiple parameters is not suppored is in the Nested Logit model and in the "panel_predict" method of the Mixed Logit model. Note that the "standard" predict method of the Mixed Logit model does support prediction with multiple parameters.

Creating multiple sets of parameters

Here, we will assume that we want to plot the in-sample (i.e. estimation set) log-likelihood with respect to the travel cost parameter for the car.

First, let's look at the summary for the Uneven Logit model.


In [16]:
swissmetro_uneven.get_statsmodels_summary()


Out[16]:
Multinomial Uneven Logit Model Regression Results
Dep. Variable: CHOICE No. Observations: 4,768
Model: Multinomial Uneven Logit Model Df Residuals: 4,751
Method: MLE Df Model: 17
Date: Thu, 17 Nov 2016 Pseudo R-squ.: 0.284
Time: 23:55:54 Pseudo R-bar-squ.: 0.281
converged: False Log-Likelihood: -3,511.398
LL-Null: -4,906.513
coef std err z P>|z| [95.0% Conf. Int.]
shape_train 1.0271 0.141 7.274 0.000 0.750 1.304
shape_swiss_metro 0.1120 0.232 0.482 0.630 -0.343 0.567
shape_car -0.4157 0.561 -0.741 0.458 -1.515 0.683
ASC Train -1.1675 0.240 -4.873 0.000 -1.637 -0.698
ASC Swissmetro -0.6523 0.234 -2.785 0.005 -1.111 -0.193
Travel Time, units:hrs (Train and Swissmetro) -0.3399 0.051 -6.621 0.000 -0.441 -0.239
Travel Time, units:hrs (Car) -0.9527 0.591 -1.611 0.107 -2.112 0.207
Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Train) -0.2552 0.071 -3.573 0.000 -0.395 -0.115
Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Swissmetro) -0.2218 0.061 -3.648 0.000 -0.341 -0.103
Travel Cost, units: 0.01 CHF (Car) -0.5969 0.404 -1.479 0.139 -1.388 0.194
Headway, units:hrs, (Train) -0.2833 0.056 -5.086 0.000 -0.392 -0.174
Headway, units:hrs, (Swissmetro) -0.4631 0.220 -2.102 0.036 -0.895 -0.031
Airline Seat Configuration, base=No (Swissmetro) -0.1097 0.108 -1.015 0.310 -0.321 0.102
Surveyed on a Train, base=No, (Train and Swissmetro) 2.0947 0.238 8.809 0.000 1.629 2.561
First Class == False, (Swissmetro) 0.1665 0.064 2.598 0.009 0.041 0.292
Number of Luggage Pieces == 1, (Car) 0.5712 0.365 1.565 0.118 -0.144 1.286
Number of Luggage Pieces > 1, (Car) 2.1772 0.967 2.251 0.024 0.281 4.073

In particular, we want to plot the log-likelihood in the range of -1 to 0 for the travel cost parameter for the car utility. To do so, we will generate 100 values of between -1 and 0, containing the true parameter value.


In [17]:
# Create an array of 100 travel cost parameters between
# Note that the travel cost parameter of interest is
# the tenth parameter, and python is zero indexed so we
# use 9 to access its value
actual_travel_cost_param = swissmetro_uneven.params.iat[9]

low_travel_cost_params = np.linspace(-1,
                                     actual_travel_cost_param,
                                     num=50,
                                     endpoint=False)

high_travel_cost_params = np.linspace(actual_travel_cost_param,
                                      0,
                                      num=50,
                                      endpoint=True)

travel_cost_params = np.concatenate((low_travel_cost_params,
                                     high_travel_cost_params))

Next, we'll create a 2D parameter array that contains all the sets of parameters. On the rows will be the particular parameters, and on the columns will be the sets of parameters.


In [18]:
# Initialize the 2D parameter array
param_array_2d = (swissmetro_uneven.params.values[:, None] *
                  np.ones(travel_cost_params.size)[None, :])

# "Fill in" the row for the travel cost parameter for cars
param_array_2d[9, :] = travel_cost_params

# Let's look at the array to make sure we created it correctly
param_array_2d[:, :4]


Out[18]:
array([[ 1.02708012,  1.02708012,  1.02708012,  1.02708012],
       [ 0.11201881,  0.11201881,  0.11201881,  0.11201881],
       [-0.41569969, -0.41569969, -0.41569969, -0.41569969],
       [-1.16746396, -1.16746396, -1.16746396, -1.16746396],
       [-0.65231139, -0.65231139, -0.65231139, -0.65231139],
       [-0.33994783, -0.33994783, -0.33994783, -0.33994783],
       [-0.95266145, -0.95266145, -0.95266145, -0.95266145],
       [-0.25521738, -0.25521738, -0.25521738, -0.25521738],
       [-0.221815  , -0.221815  , -0.221815  , -0.221815  ],
       [-1.        , -0.99193809, -0.98387617, -0.97581426],
       [-0.28331413, -0.28331413, -0.28331413, -0.28331413],
       [-0.4630699 , -0.4630699 , -0.4630699 , -0.4630699 ],
       [-0.1097036 , -0.1097036 , -0.1097036 , -0.1097036 ],
       [ 2.09467556,  2.09467556,  2.09467556,  2.09467556],
       [ 0.16650129,  0.16650129,  0.16650129,  0.16650129],
       [ 0.57115182,  0.57115182,  0.57115182,  0.57115182],
       [ 2.17715196,  2.17715196,  2.17715196,  2.17715196]])

Finally, PyLogit requires users to be explicit about the sets of parameters being passed as nest, shape, outside intercept, and index parameters. Note that index parameters are also referred to as utility coefficients. This level of explicitness is intended to help prevent errors, and it facilitates internal testing of the passed values.

To comply with the requirements of the method, we have to break apart the newly created parameter array into the four sets of parameters just mentioned: nest, shape, outside intercept, and index parameters.


In [19]:
# Since this is not a nested logit we have no nest parameters
nest_2d = None
# The uneven logit has 3 shape parameters so 
# we select the first 3 rows.
shapes_2d = param_array_2d[:3, :]
# The uneven logit has 2 outside intercept parameters
# so we select the 4th and 5th rows
intercepts_2d = param_array_2d[3:5, :]
# Finally, the remaining parameters are the index coefficients
index_coefs_2d = param_array_2d[5:, :]

# Now, we pack all these arrays together in a list
# Note that the order is intentional and it is a requirement
# for the predict method.
parameter_list = [index_coefs_2d,
                  intercepts_2d,
                  shapes_2d,
                  nest_2d]

With the parameter list created, we can finally perform the prediction. To facilitate easy calculation of the log-likelihoods, given each set of parameter values, we will only return the chosen probabilities of each choice situation.

Performing the actual prediction


In [20]:
# Pack the keyword arguments into a dictionary, simply for
# line length considerations
kwargs = {"param_list": parameter_list,
          "return_long_probs": False,
          "choice_col": choice_column}

# Calculate the predictions for each set of parameter values
# Note that the returned array will have one column per
# set of parameter values and one row per choice situation
multiple_predictions = swissmetro_uneven.predict(estimation_df,
                                                 **kwargs)

# Calculate and plot the resulting log-likelihood
# Take the log of the probabilities of the chosen alternatives
# for each choice situation and sum them. log_likelihoods
# will be a 1D array of shape (travel_cost_params.size, )
log_likelihoods = np.log(multiple_predictions).sum(axis=0)

plt.plot(travel_cost_params,
         log_likelihoods,
         label="Log-Likelihood")
plt.vlines(actual_travel_cost_param,
           log_likelihoods.min(),
           swissmetro_uneven.log_likelihood,
           linestyle='dashed',
           label="Estimated Parameter Value")
plt.xlabel("Travel Cost, units: 0.01 CHF (Car)")
plt.ylabel("Log-Likelihood")
plt.title("Log-likelihod vs. Travel Cost Parameter for Car Utility")
plt.legend(loc='best')
plt.show()



In [ ]: