pyLogit Example

The purpose of this notebook is to demonstrate they key functionalities of pyLogit:

Converting data between 'wide' and 'long' formats.
Estimating conditional logit models.

The dataset being used for this example is the "Swissmetro" dataset used in the Python Biogeme examples. The data can be downloaded at http://biogeme.epfl.ch/examples_swissmetro.html, and a detailed explanation of the variables and data-collection procedure can be found at http://www.strc.ch/conferences/2001/bierlaire1.pdf.

Relevant information about this dataset is that it is from a stated preference survey about whether or not individuals would use a new underground Magnetic-Levetation train system called the Swissmetro.

The overall set of possible choices in this dataset was "Train", "Swissmetro", and "Car." However, the choice set faced by each individual is not constant. An individual's choice set was partially based on the alternatives that he/she was capable of using at the moment. For instance, people who did not own cars did not receive a stated preference question where car was an alternative that they could choose. Note that because the choice set varies across choice situations, mlogit and statsmodels could not be used with this dataset.

Also, each individual responded to multiple choice situations. Thus the choice observations are not truly independent of all other choice observations (they are correlated accross choices made by the same individual). However, for the purposes of this example, the effect of repeat-observations on the typical i.i.d. assumptions will be ignored.

Based on the Swissmetro data, we will build a travel mode choice model for individuals who are commuting or going on a business trip.



In [1]:

    
from collections import OrderedDict    # For recording the model specification 

import pandas as pd                    # For file input/output
import numpy as np                     # For vectorized math operations

import pylogit as pl                   # For MNL model estimation and
                                       # conversion from wide to long format

Load and filter the raw Swiss Metro data



In [2]:

    
# Note that the .dat files used by python biogeme are tab delimited text files
wide_swiss_metro = pd.read_table("../data/swissmetro.dat", sep="\t")

# Select obervations whose choice is known (i.e. CHOICE != 0)
# **AND** whose PURPOSE is either 1 (commute) or 3 (business)
include_criteria = (wide_swiss_metro.PURPOSE.isin([1, 3]) &
                    (wide_swiss_metro.CHOICE != 0))
# Note that the .copy() ensures that any later changes are made 
# to a copy of the data and not to the original data
wide_swiss_metro = wide_swiss_metro.loc[include_criteria].copy()



In [3]:

    
# Look at the first 5 rows of the data
wide_swiss_metro.head().T

Convert the Swissmetro data to "Long Format"

pyLogit only estimates models using data that is in "long" format.

Long format has 1 row per individual per available alternative, and wide format has 1 row per individual or observation. Long format is useful because it permits one to directly use matrix dot products to calculate the index, $V_{ij} = x_{ij} \beta$, for each individual $\left(i \right)$ for each alternative $\left(j \right)$. In applications where one creates one's own dataset, the dataset can usually be created in long format from the very beginning. However, in situations where a dataset is provided to you in wide format (as in the case of the Swiss Metro dataset), it will be necesssary to convert the data from wide format to long format.

To convert the raw swiss metro data to long format, we need to specify:

the variables or columns that are specific to a given individual, regardless of what alternative is being considered (note: every row is being treated as a separate observation, even though each individual gave multiple responses in this stated preference survey)
the variables that vary across some or all alternatives, for a given individual (e.g. travel time)
the availability variables
the unique observation id column. (Note this dataset has an observation id column, but for the purposes of this example we don't want to consider the repeated observations of each person as being related. We therefore want a identifying column that gives an id to every response of every individual instead of to every individual).
the choice column

The cells below will identify these various columns, give them names in the long-format data, and perform the necessary conversion.



In [4]:

    
# Look at the columns of the swiss metro dataset
wide_swiss_metro.columns









    Out[4]:





Index([u'GROUP', u'SURVEY', u'SP', u'ID', u'PURPOSE', u'FIRST', u'TICKET',
       u'WHO', u'LUGGAGE', u'AGE', u'MALE', u'INCOME', u'GA', u'ORIGIN',
       u'DEST', u'TRAIN_AV', u'CAR_AV', u'SM_AV', u'TRAIN_TT', u'TRAIN_CO',
       u'TRAIN_HE', u'SM_TT', u'SM_CO', u'SM_HE', u'SM_SEATS', u'CAR_TT',
       u'CAR_CO', u'CHOICE'],
      dtype='object')



In [5]:

    
# Create the list of individual specific variables
ind_variables = wide_swiss_metro.columns.tolist()[:15]

# Specify the variables that vary across individuals and some or all alternatives
# The keys are the column names that will be used in the long format dataframe.
# The values are dictionaries whose key-value pairs are the alternative id and
# the column name of the corresponding column that encodes that variable for
# the given alternative. Examples below.
alt_varying_variables = {u'travel_time': dict([(1, 'TRAIN_TT'),
                                               (2, 'SM_TT'),
                                               (3, 'CAR_TT')]),
                          u'travel_cost': dict([(1, 'TRAIN_CO'),
                                                (2, 'SM_CO'),
                                                (3, 'CAR_CO')]),
                          u'headway': dict([(1, 'TRAIN_HE'),
                                            (2, 'SM_HE')]),
                          u'seat_configuration': dict([(2, "SM_SEATS")])}

# Specify the availability variables
# Note that the keys of the dictionary are the alternative id's.
# The values are the columns denoting the availability for the
# given mode in the dataset.
availability_variables = {1: 'TRAIN_AV',
                          2: 'SM_AV', 
                          3: 'CAR_AV'}

##########
# Determine the columns for: alternative ids, the observation ids and the choice
##########
# The 'custom_alt_id' is the name of a column to be created in the long-format data
# It will identify the alternative associated with each row.
custom_alt_id = "mode_id"

# Create a custom id column that ignores the fact that this is a 
# panel/repeated-observations dataset. Note the +1 ensures the id's start at one.
obs_id_column = "custom_id"
wide_swiss_metro[obs_id_column] = np.arange(wide_swiss_metro.shape[0],
                                            dtype=int) + 1


# Create a variable recording the choice column
choice_column = "CHOICE"



In [6]:

    
# Perform the conversion to long-format
long_swiss_metro = pl.convert_wide_to_long(wide_swiss_metro, 
                                           ind_variables, 
                                           alt_varying_variables, 
                                           availability_variables, 
                                           obs_id_column, 
                                           choice_column,
                                           new_alt_id_name=custom_alt_id)
# Look at the resulting long-format dataframe
long_swiss_metro.head(10).T









    Out[6]:






  
    
      
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9
    
  
  
    
      custom_id
      1
      1
      1
      2
      2
      2
      3
      3
      3
      4
    
    
      mode_id
      1
      2
      3
      1
      2
      3
      1
      2
      3
      1
    
    
      CHOICE
      0
      1
      0
      0
      1
      0
      0
      1
      0
      0
    
    
      GROUP
      2
      2
      2
      2
      2
      2
      2
      2
      2
      2
    
    
      SURVEY
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      SP
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
    
    
      ID
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
    
    
      PURPOSE
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
    
    
      FIRST
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      TICKET
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
    
    
      WHO
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
    
    
      LUGGAGE
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      AGE
      3
      3
      3
      3
      3
      3
      3
      3
      3
      3
    
    
      MALE
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      INCOME
      2
      2
      2
      2
      2
      2
      2
      2
      2
      2
    
    
      GA
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      ORIGIN
      2
      2
      2
      2
      2
      2
      2
      2
      2
      2
    
    
      DEST
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
    
    
      seat_configuration
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      travel_time
      112
      63
      117
      103
      60
      117
      130
      67
      117
      103
    
    
      headway
      120
      20
      0
      30
      10
      0
      60
      30
      0
      30
    
    
      travel_cost
      48
      52
      65
      48
      49
      84
      48
      58
      52
      40

Perform desired variable creations and transformations

Before estimating a model, one needs to pre-compute all of the variables that one wants to use. This is different from the functionality of other packages such as mlogit or statsmodels that use formula strings to create new variables "on-the-fly." This is also somewhat different from Python Biogeme where new variables can be defined in the script but not actually created by the user before model estimation. pyLogit does not perform variable creation. It only estimates models using variables that already exist.

Below, we pre-compute the variables needed for this example's model:

Travel time in hours instead of minutes.
Travel cost in units of 0.01 CHF (swiss franks) instead of CHF, for ease of numeric optimization.
Travel cost interacted with a variable that identifies individuals who own a season pass (and therefore have no marginal cost of traveling on the trip) or whose employer will pay for their commute/business trip.
A dummy variable for traveling with a single piece of luggage.
A dummy variable for traveling with multiple pieces of luggage.
A dummy variable denoting whether an individual is traveling first class.
A dummy variable indicating whether an individual took their survey on-board a train (since it is a-priori expected that these individuals are already willing to take a train or train-like service such as Swissmetro).



In [7]:

    
##########
# Create scaled variables so the estimated coefficients are of similar magnitudes
##########
# Scale the travel time column by 60 to convert raw units (minutes) to hours
long_swiss_metro["travel_time_hrs"] = long_swiss_metro["travel_time"] / 60.0

# Scale the headway column by 60 to convert raw units (minutes) to hours
long_swiss_metro["headway_hrs"] = long_swiss_metro["headway"] / 60.0

# Figure out who doesn't incur a marginal cost for the ticket
# This can be because he/she owns an annial season pass (GA == 1) 
# or because his/her employer pays for the ticket (WHO == 2).
# Note that all the other complexity in figuring out ticket costs
# have been accounted for except the GA pass (the annual season
# ticket). Make sure this dummy variable is only equal to 1 for
# the rows with the Train or Swissmetro
long_swiss_metro["free_ticket"] = (((long_swiss_metro["GA"] == 1) |
                                    (long_swiss_metro["WHO"] == 2)) &
                                   long_swiss_metro[custom_alt_id].isin([1,2])).astype(int)
# Scale the travel cost by 100 so estimated coefficients are of similar magnitude
# and acccount for ownership of a season pass
long_swiss_metro["travel_cost_hundreth"] = (long_swiss_metro["travel_cost"] *
                                            (long_swiss_metro["free_ticket"] == 0) /
                                            100.0)

##########
# Create various dummy variables to describe the choice context of a given
# invidual for each choice task.
##########
# Create a dummy variable for whether a person has a single piece of luggage
long_swiss_metro["single_luggage_piece"] = (long_swiss_metro["LUGGAGE"] == 1).astype(int)

# Create a dummy variable for whether a person has multiple pieces of luggage
long_swiss_metro["multiple_luggage_pieces"] = (long_swiss_metro["LUGGAGE"] == 3).astype(int)

# Create a dummy variable indicating that a person is NOT first class
long_swiss_metro["regular_class"] = 1 - long_swiss_metro["FIRST"]

# Create a dummy variable indicating that the survey was taken aboard a train
# Note that such passengers are a-priori imagined to be somewhat partial to train modes
long_swiss_metro["train_survey"] = 1 - long_swiss_metro["SURVEY"]

Create the model specification

The model specification being used in this example is the following: $$ \begin{aligned} V_{i, \textrm{Train}} &= \textrm{ASC Train} + \\ &\quad \beta _{ \textrm{tt_transit} } \textrm{Travel Time} _{ \textrm{Train}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{tc_train} } \textrm{Travel Cost}_{\textrm{Train}} * \left( GA == 0 \right) * 0.01 + \\ &\quad \beta _{ \textrm{headway_train} } \textrm{Headway} _{\textrm{Train}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{survey} } \left( \textrm{Train Survey} == 1 \right) \\ \\ V_{i, \textrm{Swissmetro}} &= \textrm{ASC Swissmetro} + \\ &\quad \beta _{ \textrm{tt_transit} } \textrm{Travel Time} _{ \textrm{Swissmetro}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{tc_sm} } \textrm{Travel Cost}_{\textrm{Swissmetro}} * \left( GA == 0 \right) * 0.01 + \\ &\quad \beta _{ \textrm{headway_sm} } \textrm{Heaway} _{\textrm{Swissmetro}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{seat} } \left( \textrm{Seat Configuration} == 1 \right) \\ &\quad \beta _{ \textrm{survey} } \left( \textrm{Train Survey} == 1 \right) \\ &\quad \beta _{ \textrm{first_class} } \left( \textrm{First Class} == 0 \right) \\ \\ V_{i, \textrm{Car}} &= \beta _{ \textrm{tt_car} } \textrm{Travel Time} _{ \textrm{Car}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{tc_car}} \textrm{Travel Cost}_{\textrm{Car}} * 0.01 + \\ &\quad \beta _{\textrm{luggage}=1} \left( \textrm{Luggage} == 1 \right) + \\ &\quad \beta _{\textrm{luggage}>1} \left( \textrm{Luggage} > 1 \right) \end{aligned} $$

Note that packages such as mlogit and statsmodels do not, by default, handle coefficients that vary over some alternatives but not all, such as the travel time coefficient that is specified as being the same for "Train" and "Swissmetro" but different for "Car."



In [8]:

    
# NOTE: - Specification and variable names must be ordered dictionaries.
#       - Keys should be variables within the long format dataframe.
#         The sole exception to this is the "intercept" key.
#       - For the specification dictionary, the values should be lists
#         of integers or or lists of lists of integers. Within a list, 
#         or within the inner-most list, the integers should be the 
#         alternative ID's of the alternative whose utility specification 
#         the explanatory variable is entering. Lists of lists denote 
#         alternatives that will share a common coefficient for the variable
#         in question.

basic_specification = OrderedDict()
basic_names = OrderedDict()

basic_specification["intercept"] = [1, 2]
basic_names["intercept"] = ['ASC Train',
                            'ASC Swissmetro']

basic_specification["travel_time_hrs"] = [[1, 2,], 3]
basic_names["travel_time_hrs"] = ['Travel Time, units:hrs (Train and Swissmetro)',
                                  'Travel Time, units:hrs (Car)']

basic_specification["travel_cost_hundreth"] = [1, 2, 3]
basic_names["travel_cost_hundreth"] = ['Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Train)',
                                       'Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Swissmetro)',
                                       'Travel Cost, units: 0.01 CHF (Car)']

basic_specification["headway_hrs"] = [1, 2]
basic_names["headway_hrs"] = ["Headway, units:hrs, (Train)",
                              "Headway, units:hrs, (Swissmetro)"]

basic_specification["seat_configuration"] = [2]
basic_names["seat_configuration"] = ['Airline Seat Configuration, base=No (Swissmetro)']

basic_specification["train_survey"] = [[1, 2]]
basic_names["train_survey"] = ["Surveyed on a Train, base=No, (Train and Swissmetro)"]

basic_specification["regular_class"] = [1]
basic_names["regular_class"] = ["First Class == False, (Swissmetro)"]

basic_specification["single_luggage_piece"] = [3]
basic_names["single_luggage_piece"] = ["Number of Luggage Pieces == 1, (Car)"]

basic_specification["multiple_luggage_pieces"] = [3]
basic_names["multiple_luggage_pieces"] = ["Number of Luggage Pieces > 1, (Car)"]

Estimate the conditional logit model



In [9]:

    
# Estimate the multinomial logit model (MNL)
swissmetro_mnl = pl.create_choice_model(data=long_swiss_metro,
                                        alt_id_col=custom_alt_id,
                                        obs_id_col=obs_id_column,
                                        choice_col=choice_column,
                                        specification=basic_specification,
                                        model_type="MNL",
                                        names=basic_names)

# Specify the initial values and method for the optimization.
swissmetro_mnl.fit_mle(np.zeros(14))

# Look at the estimation results
swissmetro_mnl.get_statsmodels_summary()









    



Log-likelihood at zero: -6,964.6630
Initial Log-likelihood: -6,964.6630
Estimation Time: 0.09 seconds.
Final log-likelihood: -5,159.2583






    



/Users/timothyb0912/anaconda/lib/python2.7/site-packages/scipy/optimize/_minimize.py:382: RuntimeWarning: Method BFGS does not use Hessian information (hess).
  RuntimeWarning)






    Out[9]:





Multinomial Logit Model Regression Results

  Dep. Variable:          CHOICE             No. Observations:      6,768  


  Model:          Multinomial Logit Model    Df Residuals:          6,754  


  Method:                   MLE              Df Model:               14    


  Date:              Mon, 21 Mar 2016        Pseudo R-squ.:         0.259  


  Time:                  22:57:13            Pseudo R-bar-squ.:     0.257  


  converged:               True              Log-Likelihood:     -5,159.258


                                           LL-Null:            -6,964.663




                                                                    coef      std err       z       P>|z|  [95.0% Conf. Int.] 


  ASC Train                                                          -1.2929      0.146     -8.845   0.000     -1.579    -1.006


  ASC Swissmetro                                                     -0.5026      0.116     -4.332   0.000     -0.730    -0.275


  Travel Time, units:hrs (Train and Swissmetro)                      -0.6990      0.042    -16.545   0.000     -0.782    -0.616


  Travel Time, units:hrs (Car)                                       -0.7230      0.047    -15.340   0.000     -0.815    -0.631


  Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Train)          -0.5618      0.094     -6.002   0.000     -0.745    -0.378


  Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Swissmetro)     -0.2817      0.045     -6.252   0.000     -0.370    -0.193


  Travel Cost, units: 0.01 CHF (Car)                                 -0.5139      0.104     -4.953   0.000     -0.717    -0.311


  Headway, units:hrs, (Train)                                        -0.3143      0.062     -5.063   0.000     -0.436    -0.193


  Headway, units:hrs, (Swissmetro)                                   -0.3773      0.196     -1.925   0.054     -0.761     0.007


  Airline Seat Configuration, base=No (Swissmetro)                   -0.7825      0.087     -8.970   0.000     -0.953    -0.611


  Surveyed on a Train, base=No, (Train and Swissmetro)                2.5425      0.114     22.235   0.000      2.318     2.767


  First Class == False, (Swissmetro)                                  0.5650      0.077      7.305   0.000      0.413     0.717


  Number of Luggage Pieces == 1, (Car)                                0.4228      0.067      6.270   0.000      0.291     0.555


  Number of Luggage Pieces > 1, (Car)                                 1.4141      0.259      5.461   0.000      0.907     1.922

View results without using statsmodels summary table

You can view all of the results simply by using print_summaries(). This will simply print the various summary dataframes.



In [10]:

    
# Look at other all results at the same time
swissmetro_mnl.print_summaries()









    




Number of Parameters                                         14
Number of Observations                                     6768
Null Log-Likelihood                                   -6964.663
Fitted Log-Likelihood                                 -5159.258
Rho-Squared                                           0.2592236
Rho-Bar-Squared                                       0.2572134
Estimation Message        Optimization terminated successfully.
dtype: object
==============================
                                                    parameters   std_err  \
ASC Train                                            -1.292943  0.146184   
ASC Swissmetro                                       -0.502595  0.116010   
Travel Time, units:hrs (Train and Swissmetro)        -0.699029  0.042250   
Travel Time, units:hrs (Car)                         -0.722998  0.047130   
Travel Cost * (Annual Pass == 0), units: 0.01 C...   -0.561769  0.093593   
Travel Cost * (Annual Pass == 0), units: 0.01 C...   -0.281683  0.045058   
Travel Cost, units: 0.01 CHF (Car)                   -0.513867  0.103745   
Headway, units:hrs, (Train)                          -0.314336  0.062085   
Headway, units:hrs, (Swissmetro)                     -0.377324  0.195969   
Airline Seat Configuration, base=No (Swissmetro)     -0.782455  0.087232   
Surveyed on a Train, base=No, (Train and Swissm...    2.542475  0.114347   
First Class == False, (Swissmetro)                    0.565015  0.077341   
Number of Luggage Pieces == 1, (Car)                  0.422767  0.067424   
Number of Luggage Pieces > 1, (Car)                   1.414052  0.258917   

                                                      t_stats       p_values  \
ASC Train                                           -8.844625   9.183619e-19   
ASC Swissmetro                                      -4.332359   1.475204e-05   
Travel Time, units:hrs (Train and Swissmetro)      -16.545035   1.738624e-61   
Travel Time, units:hrs (Car)                       -15.340414   4.105670e-53   
Travel Cost * (Annual Pass == 0), units: 0.01 C...  -6.002282   1.945635e-09   
Travel Cost * (Annual Pass == 0), units: 0.01 C...  -6.251572   4.063404e-10   
Travel Cost, units: 0.01 CHF (Car)                  -4.953161   7.301745e-07   
Headway, units:hrs, (Train)                         -5.063008   4.126931e-07   
Headway, units:hrs, (Swissmetro)                    -1.925430   5.417557e-02   
Airline Seat Configuration, base=No (Swissmetro)    -8.969775   2.971216e-19   
Surveyed on a Train, base=No, (Train and Swissm...  22.234822  1.581947e-109   
First Class == False, (Swissmetro)                   7.305494   2.762506e-13   
Number of Luggage Pieces == 1, (Car)                 6.270276   3.604088e-10   
Number of Luggage Pieces > 1, (Car)                  5.461402   4.723880e-08   

                                                    robust_std_err  \
ASC Train                                                 0.302543   
ASC Swissmetro                                            0.392238   
Travel Time, units:hrs (Train and Swissmetro)             0.146476   
Travel Time, units:hrs (Car)                              0.164374   
Travel Cost * (Annual Pass == 0), units: 0.01 C...        0.128883   
Travel Cost * (Annual Pass == 0), units: 0.01 C...        0.066505   
Travel Cost, units: 0.01 CHF (Car)                        0.230016   
Headway, units:hrs, (Train)                               0.061189   
Headway, units:hrs, (Swissmetro)                          0.206538   
Airline Seat Configuration, base=No (Swissmetro)          0.097108   
Surveyed on a Train, base=No, (Train and Swissm...        0.351394   
First Class == False, (Swissmetro)                        0.078165   
Number of Luggage Pieces == 1, (Car)                      0.156215   
Number of Luggage Pieces > 1, (Car)                       0.493739   

                                                    robust_t_stats  \
ASC Train                                                -4.273586   
ASC Swissmetro                                           -1.281352   
Travel Time, units:hrs (Train and Swissmetro)            -4.772303   
Travel Time, units:hrs (Car)                             -4.398497   
Travel Cost * (Annual Pass == 0), units: 0.01 C...       -4.358752   
Travel Cost * (Annual Pass == 0), units: 0.01 C...       -4.235516   
Travel Cost, units: 0.01 CHF (Car)                       -2.234049   
Headway, units:hrs, (Train)                              -5.137165   
Headway, units:hrs, (Swissmetro)                         -1.826904   
Airline Seat Configuration, base=No (Swissmetro)         -8.057613   
Surveyed on a Train, base=No, (Train and Swissm...        7.235402   
First Class == False, (Swissmetro)                        7.228484   
Number of Luggage Pieces == 1, (Car)                      2.706309   
Number of Luggage Pieces > 1, (Car)                       2.863968   

                                                    robust_p_values  
ASC Train                                              1.923545e-05  
ASC Swissmetro                                         2.000702e-01  
Travel Time, units:hrs (Train and Swissmetro)          1.821311e-06  
Travel Time, units:hrs (Car)                           1.090030e-05  
Travel Cost * (Annual Pass == 0), units: 0.01 C...     1.308061e-05  
Travel Cost * (Annual Pass == 0), units: 0.01 C...     2.280274e-05  
Travel Cost, units: 0.01 CHF (Car)                     2.547984e-02  
Headway, units:hrs, (Train)                            2.789139e-07  
Headway, units:hrs, (Swissmetro)                       6.771420e-02  
Airline Seat Configuration, base=No (Swissmetro)       7.779883e-16  
Surveyed on a Train, base=No, (Train and Swissm...     4.641533e-13  
First Class == False, (Swissmetro)                     4.884147e-13  
Number of Luggage Pieces == 1, (Car)                   6.803565e-03  
Number of Luggage Pieces > 1, (Car)                    4.183697e-03



In [11]:

    
# Look at the general and goodness of fit statistics
swissmetro_mnl.fit_summary









    Out[11]:





Number of Parameters                                         14
Number of Observations                                     6768
Null Log-Likelihood                                   -6964.663
Fitted Log-Likelihood                                 -5159.258
Rho-Squared                                           0.2592236
Rho-Bar-Squared                                       0.2572134
Estimation Message        Optimization terminated successfully.
dtype: object



In [12]:

    
# Look at the parameter estimation results, and round the results for easy viewing
np.round(swissmetro_mnl.summary, 3)









    Out[12]:






  
    
      
      parameters
      std_err
      t_stats
      p_values
      robust_std_err
      robust_t_stats
      robust_p_values
    
  
  
    
      ASC Train
      -1.293
      0.146
      -8.845
      0.000
      0.303
      -4.274
      0.000
    
    
      ASC Swissmetro
      -0.503
      0.116
      -4.332
      0.000
      0.392
      -1.281
      0.200
    
    
      Travel Time, units:hrs (Train and Swissmetro)
      -0.699
      0.042
      -16.545
      0.000
      0.146
      -4.772
      0.000
    
    
      Travel Time, units:hrs (Car)
      -0.723
      0.047
      -15.340
      0.000
      0.164
      -4.398
      0.000
    
    
      Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Train)
      -0.562
      0.094
      -6.002
      0.000
      0.129
      -4.359
      0.000
    
    
      Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Swissmetro)
      -0.282
      0.045
      -6.252
      0.000
      0.067
      -4.236
      0.000
    
    
      Travel Cost, units: 0.01 CHF (Car)
      -0.514
      0.104
      -4.953
      0.000
      0.230
      -2.234
      0.025
    
    
      Headway, units:hrs, (Train)
      -0.314
      0.062
      -5.063
      0.000
      0.061
      -5.137
      0.000
    
    
      Headway, units:hrs, (Swissmetro)
      -0.377
      0.196
      -1.925
      0.054
      0.207
      -1.827
      0.068
    
    
      Airline Seat Configuration, base=No (Swissmetro)
      -0.782
      0.087
      -8.970
      0.000
      0.097
      -8.058
      0.000
    
    
      Surveyed on a Train, base=No, (Train and Swissmetro)
      2.542
      0.114
      22.235
      0.000
      0.351
      7.235
      0.000
    
    
      First Class == False, (Swissmetro)
      0.565
      0.077
      7.305
      0.000
      0.078
      7.228
      0.000
    
    
      Number of Luggage Pieces == 1, (Car)
      0.423
      0.067
      6.270
      0.000
      0.156
      2.706
      0.007
    
    
      Number of Luggage Pieces > 1, (Car)
      1.414
      0.259
      5.461
      0.000
      0.494
      2.864
      0.004

	0	1	2	3	4
GROUP	2	2	2	2	2
SURVEY	0	0	0	0	0
SP	1	1	1	1	1
ID	1	1	1	1	1
PURPOSE	1	1	1	1	1
FIRST	0	0	0	0	0
TICKET	1	1	1	1	1
WHO	1	1	1	1	1
LUGGAGE	0	0	0	0	0
AGE	3	3	3	3	3
MALE	0	0	0	0	0
INCOME	2	2	2	2	2
GA	0	0	0	0	0
ORIGIN	2	2	2	2	2
DEST	1	1	1	1	1
TRAIN_AV	1	1	1	1	1
CAR_AV	1	1	1	1	1
SM_AV	1	1	1	1	1
TRAIN_TT	112	103	130	103	130
TRAIN_CO	48	48	48	40	36
TRAIN_HE	120	30	60	30	60
SM_TT	63	60	67	63	63
SM_CO	52	49	58	52	42
SM_HE	20	10	30	20	20
SM_SEATS	0	0	0	0	0
CAR_TT	117	117	117	72	90
CAR_CO	65	84	52	52	84
CHOICE	2	2	2	2	2

	0	1	2	3	4	5	6	7	8	9
custom_id	1	1	1	2	2	2	3	3	3	4
mode_id	1	2	3	1	2	3	1	2	3	1
CHOICE	0	1	0	0	1	0	0	1	0	0
GROUP	2	2	2	2	2	2	2	2	2	2
SURVEY	0	0	0	0	0	0	0	0	0	0
SP	1	1	1	1	1	1	1	1	1	1
ID	1	1	1	1	1	1	1	1	1	1
PURPOSE	1	1	1	1	1	1	1	1	1	1
FIRST	0	0	0	0	0	0	0	0	0	0
TICKET	1	1	1	1	1	1	1	1	1	1
WHO	1	1	1	1	1	1	1	1	1	1
LUGGAGE	0	0	0	0	0	0	0	0	0	0
AGE	3	3	3	3	3	3	3	3	3	3
MALE	0	0	0	0	0	0	0	0	0	0
INCOME	2	2	2	2	2	2	2	2	2	2
GA	0	0	0	0	0	0	0	0	0	0
ORIGIN	2	2	2	2	2	2	2	2	2	2
DEST	1	1	1	1	1	1	1	1	1	1
seat_configuration	0	0	0	0	0	0	0	0	0	0
travel_time	112	63	117	103	60	117	130	67	117	103
headway	120	20	0	30	10	0	60	30	0	30
travel_cost	48	52	65	48	49	84	48	58	52	40

Dep. Variable:	CHOICE	No. Observations:	6,768
Model:	Multinomial Logit Model	Df Residuals:	6,754
Method:	MLE	Df Model:	14
Date:	Mon, 21 Mar 2016	Pseudo R-squ.:	0.259
Time:	22:57:13	Pseudo R-bar-squ.:	0.257
converged:	True	Log-Likelihood:	-5,159.258
		LL-Null:	-6,964.663

	coef	std err	z	P>\|z\|	[95.0% Conf. Int.]
ASC Train	-1.2929	0.146	-8.845	0.000	-1.579 -1.006
ASC Swissmetro	-0.5026	0.116	-4.332	0.000	-0.730 -0.275
Travel Time, units:hrs (Train and Swissmetro)	-0.6990	0.042	-16.545	0.000	-0.782 -0.616
Travel Time, units:hrs (Car)	-0.7230	0.047	-15.340	0.000	-0.815 -0.631
Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Train)	-0.5618	0.094	-6.002	0.000	-0.745 -0.378
Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Swissmetro)	-0.2817	0.045	-6.252	0.000	-0.370 -0.193
Travel Cost, units: 0.01 CHF (Car)	-0.5139	0.104	-4.953	0.000	-0.717 -0.311
Headway, units:hrs, (Train)	-0.3143	0.062	-5.063	0.000	-0.436 -0.193
Headway, units:hrs, (Swissmetro)	-0.3773	0.196	-1.925	0.054	-0.761 0.007
Airline Seat Configuration, base=No (Swissmetro)	-0.7825	0.087	-8.970	0.000	-0.953 -0.611
Surveyed on a Train, base=No, (Train and Swissmetro)	2.5425	0.114	22.235	0.000	2.318 2.767
First Class == False, (Swissmetro)	0.5650	0.077	7.305	0.000	0.413 0.717
Number of Luggage Pieces == 1, (Car)	0.4228	0.067	6.270	0.000	0.291 0.555
Number of Luggage Pieces > 1, (Car)	1.4141	0.259	5.461	0.000	0.907 1.922

	parameters	std_err	t_stats	p_values	robust_std_err	robust_t_stats	robust_p_values
ASC Train	-1.293	0.146	-8.845	0.000	0.303	-4.274	0.000
ASC Swissmetro	-0.503	0.116	-4.332	0.000	0.392	-1.281	0.200
Travel Time, units:hrs (Train and Swissmetro)	-0.699	0.042	-16.545	0.000	0.146	-4.772	0.000
Travel Time, units:hrs (Car)	-0.723	0.047	-15.340	0.000	0.164	-4.398	0.000
Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Train)	-0.562	0.094	-6.002	0.000	0.129	-4.359	0.000
Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Swissmetro)	-0.282	0.045	-6.252	0.000	0.067	-4.236	0.000
Travel Cost, units: 0.01 CHF (Car)	-0.514	0.104	-4.953	0.000	0.230	-2.234	0.025
Headway, units:hrs, (Train)	-0.314	0.062	-5.063	0.000	0.061	-5.137	0.000
Headway, units:hrs, (Swissmetro)	-0.377	0.196	-1.925	0.054	0.207	-1.827	0.068
Airline Seat Configuration, base=No (Swissmetro)	-0.782	0.087	-8.970	0.000	0.097	-8.058	0.000
Surveyed on a Train, base=No, (Train and Swissmetro)	2.542	0.114	22.235	0.000	0.351	7.235	0.000
First Class == False, (Swissmetro)	0.565	0.077	7.305	0.000	0.078	7.228	0.000
Number of Luggage Pieces == 1, (Car)	0.423	0.067	6.270	0.000	0.156	2.706	0.007
Number of Luggage Pieces > 1, (Car)	1.414	0.259	5.461	0.000	0.494	2.864	0.004