302: Itinerary Choice using Simple Nested Logit


In [1]:
import larch

This example is an itinerary choice model built using the example itinerary choice dataset included with Larch. See example 300 for details.


In [2]:
from larch.examples import example
d = example(300, 'd')


converting data_co to <class 'numpy.float64'>
converting data_ce to <class 'numpy.float64'>
rescaled array of weights by a factor of 2239.980952380952

We will be building a nested logit model, but in order to do so we need to rationalize the alternative numbers. As given, our raw itinerary choice data has a lot of alternatives, but they are not ordered or numbered in a regular way; each elemental alternative has an arbitrary code number assigned to it, and the code numbers for one case are not comparable to another case. We need to renumber the alternatives in a manner that is more suited for our application, such that based on the code number we can programatically extract a the relevant features of the alternative that we will want to use in building our nested logit model. In this example we want to test a model which has nests based on level of service. To renumber, first we will define the relevant categories and values, and establish a numbering system using a special object:


In [3]:
d1 = d.new_systematic_alternatives(
    groupby='nb_cnxs',
    name='alternative_code',
    padding_levels=4,
    groupby_prefixes=['Cnx'],
    overwrite=False,
    complete_features_list={'nb_cnxs':[0,1,2]},
)


converting data_ce to <class 'numpy.float64'>

If we compare the new data with the old data, we'll see that we have created a few more alternative.


In [4]:
d.info()


larch.DataFrames:
  n_cases: 105
  n_alts: 127
  data_ce:
    - nb_cnxs
    - elapsed_time
    - fare_hy
    - fare_ly
    - equipment
    - carrier
    - timeperiod
  data_co:
    - traveler
    - origin
    - destination
  data_av: <populated>
  data_ch: choice
  data_wt: <populated>

In [5]:
d1.info()


larch.DataFrames:
  n_cases: 105
  n_alts: 134
  data_ce:
    - id_alt
    - nb_cnxs
    - elapsed_time
    - fare_hy
    - fare_ly
    - equipment
    - carrier
    - timeperiod
  data_co:
    - traveler
    - origin
    - destination
  data_av: <populated>
  data_ch: <populated>
  data_wt: <populated>

Now let's make our model. The utility function we will use is the same as the one we used for the MNL version of the model.


In [6]:
m = larch.Model(dataservice=d1)

v = [
    "timeperiod==2",
    "timeperiod==3",
    "timeperiod==4",
    "timeperiod==5",
    "timeperiod==6",
    "timeperiod==7",
    "timeperiod==8",
    "timeperiod==9",
    "carrier==2",
    "carrier==3",
    "carrier==4",
    "carrier==5",
    "equipment==2",
    "fare_hy",
    "fare_ly",    
    "elapsed_time",  
    "nb_cnxs",       
]
from larch.roles import PX
m.utility_ca = sum(PX(i) for i in v)

m.choice_ca_var = 'choice'

If we just end our model specification here, we will have a plain MNL model. To change to a nested logit model, all we need to do is add the nests. We can do this easily, using the special magic_nesting method, that uses the structure of the data that we defined above.


In [7]:
m.magic_nesting()

In [8]:
m.load_data()


req_data does not request weight_co but it is set and being provided

In [9]:
m.maximize_loglike()


Iteration 009 [Converged]

LL = -347.19303042325504

value initvalue nullvalue minimum maximum holdfast note best
MU_nb_cnxs 0.691112 1.0 1.0 0.001000 1.000000 0 0.691112
carrier==2 0.079526 0.0 0.0 -inf inf 0 0.079526
carrier==3 0.440481 0.0 0.0 -inf inf 0 0.440481
carrier==4 0.396793 0.0 0.0 -inf inf 0 0.396793
carrier==5 -0.439080 0.0 0.0 -inf inf 0 -0.439080
elapsed_time -0.004233 0.0 0.0 -inf inf 0 -0.004233
equipment==2 0.326877 0.0 0.0 -inf inf 0 0.326877
fare_hy -0.000847 0.0 0.0 -inf inf 0 -0.000847
fare_ly -0.000856 0.0 0.0 -inf inf 0 -0.000856
nb_cnxs -3.155549 0.0 0.0 -inf inf 0 -3.155549
timeperiod==2 0.065527 0.0 0.0 -inf inf 0 0.065527
timeperiod==3 0.088094 0.0 0.0 -inf inf 0 0.088094
timeperiod==4 0.042914 0.0 0.0 -inf inf 0 0.042914
timeperiod==5 0.096519 0.0 0.0 -inf inf 0 0.096519
timeperiod==6 0.164687 0.0 0.0 -inf inf 0 0.164687
timeperiod==7 0.243887 0.0 0.0 -inf inf 0 0.243887
timeperiod==8 0.245135 0.0 0.0 -inf inf 0 0.245135
timeperiod==9 -0.005913 0.0 0.0 -inf inf 0 -0.005913
Out[9]:
┣          loglike: -347.19303042325504
┣                x: MU_nb_cnxs       0.691112
┃                   carrier==2       0.079526
┃                   carrier==3       0.440481
┃                   carrier==4       0.396793
┃                   carrier==5      -0.439080
┃                   elapsed_time    -0.004233
┃                   equipment==2     0.326877
┃                   fare_hy         -0.000847
┃                   fare_ly         -0.000856
┃                   nb_cnxs         -3.155549
┃                   timeperiod==2    0.065527
┃                   timeperiod==3    0.088094
┃                   timeperiod==4    0.042914
┃                   timeperiod==5    0.096519
┃                   timeperiod==6    0.164687
┃                   timeperiod==7    0.243887
┃                   timeperiod==8    0.245135
┃                   timeperiod==9   -0.005913
┃                   dtype: float64
┣        tolerance: 3.8832811170766046e-06
┣            steps: array([1., 1., 1., 1., 1., 1., 1., 1., 1.])
┣          message: 'Optimization terminated successfully.'
┣     elapsed_time: datetime.timedelta(microseconds=61784)
┣           method: 'bhhh'
┣          n_cases: 105
┣ iteration_number: 9
┣          logloss: 3.306600289745286

In [ ]: