Skip to content

The Nested Logit Model

The Nested Logit model considers sub-groups of alternatives totally substitutables, called 'nests'. The general idea is that a customer might choose its transportation mode between publics transport and its private car. And then, if he decides to use public transportations the customer chooses between taking the train or the bus.\ The classical Conditional Logit does not account for such decision process. Hence the introduction of the Nested Logit. More detailed information are available here.

In this notebook we reproduce results from other packages showing how to speficy a Nested Logit model with Choice-Learn and that we reach the right results.

Summary

import os

os.environ["CUDA_VISIBLE_DEVICES"] = ""

import sys

sys.path.append("../../")

import numpy as np
import pandas as pd

Import the Nested Logit from Choice-Learn !

from choice_learn.models import NestedLogit

1- Nested Logit on the SwissMetro dataset

We reproduce the results from Biogeme that is also reproduced in PyLogit.\ This example uses the SwissMetro dataset further described in the data introduction.

from choice_learn.datasets import load_swissmetro
swiss_dataset = load_swissmetro(preprocessing="biogeme_nested")
print(swiss_dataset.summary())
%=====================================================================%
%%% Summary of the dataset:
%=====================================================================%
Number of items: 3
Number of choices: 6768
%=====================================================================%
 No Shared Features by Choice registered


 Items Features by Choice:
 2 items features 
 with names: (['cost', 'travel_time'],)
%=====================================================================%

Short Introduction to Nested Logit

The model specified in Biogeme defines two nests: - The existing modes nest with the train and car (items indexes of 0 and 2) - The future modes nest with the swissmetro (item index of 1)

And the utility form is the following:\       $U(i) = \beta^{inter}i + \beta^{tt} \cdot TT(i) + \beta^{co} \cdot CO(i)$\ with: - $TT(i)$ the travel time of alternative $i$ - $CO(i)$ the cost of alternative $i$ - $\beta^{inter} = 0$

The Nested Logit formulates the following probabilities from such utility:

To better understand this expression, one way is to split it into two parts:

with:\ $ \mathbb{P}(i | nest(i)) = \frac{e^{\frac{U(i)}{\gamma_{nest(i)}}} }{\sum_{k \in nest(i)} e^{\frac{U(k)}{\gamma_{nest(i)}}}} $   and   $ \mathbb{P}(nest(i))= \frac{\left( \sum_{k \in nest(i)} e^{\frac{U(k)}{\gamma_{nest(i)}}} \right)^{\gamma_{nest(i)}}}{\sum_{m \in \mathcal{Nests}} \left( \sum_{k \in m} e^{\frac{U(k)}{\gamma_{m}}} \right)^{\gamma_m}} $

Therefore we have 4 weights in the utility function and the $\gamma_{nest}$ values to estimate. The 'new' nest containing only one alternative, its correlation value $\gamma^{new}$ has no impact, we only need to estimate $\gamma^{old}$.

Specification and estimation with Choice-Learn

With Choice-Learn, the Nested Logit model specification is similar to the Conditional Logit specification. The few differences are: - When the model is instantiated, the nested need to be specified as a list of nests with the concerned items indexes. In the example, we specify items_nests=[[0, 2], [1]] saying that first nest contains the items of indexes 0 (train) and 2 (car) and the second nest the item of index 1 (swiss metro). - The "fast" dict-base specifications has another alternative with coefficients={feature_name: "nest"} creating for the feature feature_name one coefficient to estimate by nest, this coefficient being shared by all alternatives of the nest.

# Initialization of the model
swiss_model = NestedLogit(optimizer="lbfgs", items_nests=[[0, 2], [1]])

# Intercept for train & sm
swiss_model.add_coefficients(feature_name="intercept", items_indexes=[0, 2])

# betas TT and CO shared by train and sm
swiss_model.add_shared_coefficient(feature_name="travel_time",
                                   items_indexes=[0, 1, 2])
swiss_model.add_shared_coefficient(feature_name="cost",
                                   items_indexes=[0, 1, 2])
# Estimation of the model
history = swiss_model.fit(swiss_dataset, get_report=True, verbose=2)
WARNING:tensorflow:AutoGraph could not transform <bound method Socket.send of <zmq.Socket(zmq.PUSH) at 0x7f46435341a0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert


WARNING:root:At least one gamma value for nests is below 0.05 and is
        clipped to 0.05 for numeric optimization purposes.
WARNING:tensorflow:AutoGraph could not transform <bound method Socket.send of <zmq.Socket(zmq.PUSH) at 0x7f46435341a0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert


WARNING: AutoGraph could not transform <bound method Socket.send of <zmq.Socket(zmq.PUSH) at 0x7f46435341a0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert


WARNING:root:L-BFGS Opimization finished:
WARNING:root:---------------------------------------------------------------
WARNING:root:Number of iterations: 21
WARNING:root:Algorithm converged before reaching max iterations: True


Using L-BFGS optimizer, setting up .fit() function
swiss_model.trainable_weights
[<tf.Variable 'beta_intercept:0' shape=(1, 2) dtype=float32, numpy=array([[-0.51194817, -0.1671558 ]], dtype=float32)>,
 <tf.Variable 'beta_travel_time:0' shape=(1, 1) dtype=float32, numpy=array([[-0.89866394]], dtype=float32)>,
 <tf.Variable 'beta_cost:0' shape=(1, 1) dtype=float32, numpy=array([[-0.85666543]], dtype=float32)>,
 <tf.Variable 'gammas_nests:0' shape=(1, 1) dtype=float32, numpy=array([[0.4868395]], dtype=float32)>]
swiss_model.report
Coefficient Name Coefficient Estimation Std. Err z_value P(.>z)
0 beta_intercept_0 -0.511948 0.046159 -11.090872 0.000000
1 beta_intercept_1 -0.167156 0.036682 -4.556830 0.000005
2 beta_travel_time -0.898664 0.054548 -16.474657 0.000000
3 beta_cost -0.856665 0.046482 -18.430079 0.000000
4 gammas_nests 0.486840 0.029635 16.427719 0.000000
# Looking at the weights
swiss_model.trainable_weights
[<tf.Variable 'beta_intercept:0' shape=(1, 2) dtype=float32, numpy=array([[-0.51194817, -0.1671558 ]], dtype=float32)>,
 <tf.Variable 'beta_travel_time:0' shape=(1, 1) dtype=float32, numpy=array([[-0.89866394]], dtype=float32)>,
 <tf.Variable 'beta_cost:0' shape=(1, 1) dtype=float32, numpy=array([[-0.85666543]], dtype=float32)>,
 <tf.Variable 'gammas_nests:0' shape=(1, 1) dtype=float32, numpy=array([[0.4868395]], dtype=float32)>]
# Estimating the total summed Negative Log-Likelihood
swiss_model.evaluate(swiss_dataset) * len(swiss_dataset)
<tf.Tensor: shape=(), dtype=float32, numpy=5236.8994>
# Probabilities can be easily computed:
probas = swiss_model.predict_probas(swiss_dataset)
print(probas[:4])
tf.Tensor(
[[0.15937711 0.6218435  0.21877931]
 [0.1940201  0.6445151  0.16146483]
 [0.11813082 0.597691   0.28417817]
 [0.12110618 0.52606976 0.3528241 ]], shape=(4, 3), dtype=float32)

Interpretation and comparison with Biogeme results

2- Nested Logit with the HC Dataset

We reproduce results from mlogit that are also presented in Torch-Choice.

First Formulation

from choice_learn.datasets import load_hc
from choice_learn.data import ChoiceDataset

# Loading
hc_df = load_hc(as_frame=True)

# HC dataset is loaded as a pandas.DataFrame for the example.
# It can be downloaded as a ChoiceDataset with the argument `as_frame=False`
hc_df.head()
rownames depvar ich.gcc ich.ecc ich.erc ich.hpc ich.gc ich.ec ich.er icca och.gcc och.ecc och.erc och.hpc och.gc och.ec och.er occa income
0 1 erc 9.70 7.86 8.79 11.36 24.08 24.50 7.37 27.28 2.26 4.09 3.85 1.73 2.26 4.09 3.85 2.95 20.0
1 2 hpc 8.77 8.69 7.09 9.37 28.00 32.71 9.33 26.49 2.30 2.69 3.45 1.65 2.30 2.69 3.45 1.63 50.0
2 3 gcc 7.43 8.86 6.94 11.70 25.71 31.68 8.14 22.63 2.28 5.25 4.35 1.44 2.28 5.25 4.35 2.18 50.0
3 4 gcc 9.18 8.93 7.22 12.13 29.72 26.73 8.04 25.33 2.62 4.89 4.85 1.93 2.62 4.89 4.85 2.70 50.0
4 5 gcc 8.05 7.02 8.44 10.51 23.90 28.35 7.15 25.45 2.52 3.71 3.64 1.63 2.52 3.71 3.64 2.77 60.0

It is possible to pre-process the dataset like in the examples to 'easily' specify the Nested Logit model:

items_id = ["gcc", "ecc", "erc", "hpc", "gc", "ec", "er"]
cooling_modes = ["gcc", "ecc", "erc", "hpc"]
room_modes = ["erc", "er"]
non_cooling_modes = ["gc", "ec", "er"]

for mode in items_id:
    if mode in cooling_modes:
        hc_df[f"icca.{mode}"] = hc_df["icca"]
        hc_df[f"occa.{mode}"] = hc_df["occa"]
    else:
        hc_df[f"icca.{mode}"] = 0.
        hc_df[f"occa.{mode}"] = 0.

for item in items_id:
    if item in cooling_modes:
        hc_df[f"int_cooling.{item}"] = 1.
        hc_df[f"inc_cooling.{item}"] = hc_df.income
    else:
        hc_df[f"int_cooling.{item}"] = 0.
        hc_df[f"inc_cooling.{item}"] = 0.
    if item in room_modes:
        hc_df[f"inc_room.{item}"] = hc_df.income
    else:
        hc_df[f"inc_room.{item}"] = 0
# Creating the dataset from this preprocessed dataframe
dataset = ChoiceDataset.from_single_wide_df(df=hc_df,
                                            items_features_prefixes=["ich", "och", "occa", "icca",
                                                                     "int_cooling", "inc_cooling",
                                                                     "inc_room"],
                                            delimiter=".",
                                            items_id=items_id,
                                            choices_column="depvar",
                                            choice_format="items_id")

We can use the fast specification using a dictionnary with the 'constant' keyword.

spec = {
    "ich": "constant",
    "och": "constant",
    "occa": "constant",
    "icca": "constant",
    "int_cooling":"constant",
    "inc_cooling": "constant",
    "inc_room": "constant"
}
model = NestedLogit(
    coefficients=spec,
    items_nests=[[0, 1, 2, 3], [4, 5, 6]],
    optimizer="lbfgs",
    shared_gammas_over_nests=True # Note the argument specifying that all nests have the same gamma value
)
Using L-BFGS optimizer, setting up .fit() function
hist = model.fit(dataset, get_report=True, verbose=1)
WARNING:root:At least one gamma value for nests is below 0.05 and is
        clipped to 0.05 for numeric optimization purposes.
WARNING:root:L-BFGS Opimization finished:
WARNING:root:---------------------------------------------------------------
WARNING:root:Number of iterations: 55
WARNING:root:Algorithm converged before reaching max iterations: True


Using L-BFGS optimizer, setting up .fit() function
model.report
Coefficient Name Coefficient Estimation Std. Err z_value P(.>z)
0 ich_w_0 -0.554876 0.174471 -3.180328 0.001471
1 och_w_1 -0.857881 0.300332 -2.856437 0.004284
2 occa_w_2 -1.089362 1.056226 -1.031371 0.302367
3 icca_w_3 -0.225069 0.112268 -2.004749 0.044990
4 int_cooling_w_4 -6.000777 4.986898 -1.203308 0.228857
5 inc_cooling_w_5 0.249571 0.053589 4.657146 0.000003
6 inc_room_w_6 -0.378969 0.116035 -3.265980 0.001091
7 gamma_nests 0.585920 0.242312 2.418043 0.015604

Another possibility is to keep the dataset as is and specify manually the model:

# Creating the dataset
dataset = ChoiceDataset.from_single_wide_df(df=hc_df,
                                            shared_features_columns=["income"],
                                            items_features_prefixes=["ich", "och", "occa", "icca"],
                                            delimiter=".",
                                            items_id=items_id,
                                            choices_column="depvar",
                                            choice_format="items_id")

Using the manual specification we define each weight and the indexes of the concerned items.

model = NestedLogit(items_nests=[[0, 1, 2, 3], [4, 5, 6]],
                    optimizer="lbfgs",
                    shared_gammas_over_nests=True)
# Coefficients that are for all the alternatives
model.add_shared_coefficient(feature_name="ich", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="och", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="icca", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="occa", items_indexes=[0, 1, 2, 3, 4, 5, 6])

# The coefficients concerning the income are split into two groups of alternatives:
model.add_shared_coefficient(feature_name="income", items_indexes=[0, 1, 2, 3], coefficient_name="income_cooling")
model.add_shared_coefficient(feature_name="income", items_indexes=[2, 6], coefficient_name="income_room")

# Finally only one nest has an intercept
model.add_shared_coefficient(feature_name="intercept", items_indexes=[0, 1, 2, 3])
Using L-BFGS optimizer, setting up .fit() function
hist = model.fit(dataset, get_report=True, verbose=1)
WARNING:root:At least one gamma value for nests is below 0.05 and is
        clipped to 0.05 for numeric optimization purposes.
WARNING:root:L-BFGS Opimization finished:
WARNING:root:---------------------------------------------------------------
WARNING:root:Number of iterations: 39
WARNING:root:Algorithm converged before reaching max iterations: True


Using L-BFGS optimizer, setting up .fit() function
WARNING:tensorflow:5 out of the last 5 calls to <function pfor.<locals>.f at 0x7f459f242b60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.


WARNING:tensorflow:5 out of the last 5 calls to <function pfor.<locals>.f at 0x7f459f242b60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.


WARNING:tensorflow:6 out of the last 6 calls to <function pfor.<locals>.f at 0x7f459f16e520> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.


WARNING:tensorflow:6 out of the last 6 calls to <function pfor.<locals>.f at 0x7f459f16e520> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
model.report
Coefficient Name Coefficient Estimation Std. Err z_value P(.>z)
0 beta_ich -0.554878 0.174468 -3.180405 0.001471
1 beta_och -0.857885 0.300328 -2.856494 0.004283
2 beta_icca -0.225067 0.112267 -2.004751 0.044990
3 beta_occa -1.089337 1.056223 -1.031352 0.302376
4 income_cooling 0.249571 0.053588 4.657206 0.000003
5 income_room -0.378970 0.116032 -3.266064 0.001091
6 beta_intercept -6.000929 4.986704 -1.203386 0.228827
7 gamma_nests 0.585921 0.242307 2.418097 0.015602
model.trainable_weights
# The gamma value can be retrieved to compute the correlation as follow:
correlation = 1 -model.trainable_weights[-1][0][0].numpy()
print("Correlation over alternatives within each nest:", correlation)

Second Formulation

model = NestedLogit(items_nests=[[0, 1, 3, 4, 5], [2, 6]],
                    optimizer="lbfgs",
                    shared_gammas_over_nests=True)
# Coefficients that are for all the alternatives
model.add_shared_coefficient(feature_name="ich", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="och", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="icca", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="occa", items_indexes=[0, 1, 2, 3, 4, 5, 6])

# The coefficients concerning the income are split into two groups of alternatives:
model.add_shared_coefficient(feature_name="income", items_indexes=[0, 1, 2, 3],
                             coefficient_name="income_cooling")
model.add_shared_coefficient(feature_name="income", items_indexes=[2, 6],
                             coefficient_name="income_room")

# Finally only one nest has an intercept
model.add_shared_coefficient(feature_name="intercept", items_indexes=[0, 1, 2, 3],
coefficient_name="int.cooling")
hist = model.fit(dataset, get_report=True, verbose=2)
model.report
Coefficient Name Coefficient Estimation Std. Err z_value P(.>z)
0 beta_ich -1.106775 1.271377 -0.870533 0.384009
1 beta_och -1.774187 2.188208 -0.810794 0.417484
2 beta_icca -0.329113 0.308635 -1.066349 0.286266
3 beta_occa -1.990666 2.507875 -0.793766 0.427332
4 income_cooling 0.405710 0.425043 0.954515 0.339823
5 income_room -0.737662 0.742359 -0.993673 0.320382
6 int.cooling -13.468433 18.365850 -0.733341 0.463350
7 gamma_nests 1.323886 1.889999 0.700469 0.483634
# NLL:
model.evaluate(dataset) * len(dataset)
<tf.Tensor: shape=(), dtype=float32, numpy=180.03148>

Third formulation

model = NestedLogit(items_nests=[[0, 1, 2, 3], [4, 5, 6]],
                    optimizer="lbfgs",
                    shared_gammas_over_nests=False)
# Coefficients that are for all the alternatives
model.add_shared_coefficient(feature_name="ich", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="och", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="icca", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="occa", items_indexes=[0, 1, 2, 3, 4, 5, 6])

# The coefficients concerning the income are split into two groups of alternatives:
model.add_shared_coefficient(feature_name="income", items_indexes=[0, 1, 2, 3], coefficient_name="income_cooling")
model.add_shared_coefficient(feature_name="income", items_indexes=[2, 6], coefficient_name="income_room")

# Finally only one nest has an intercept
model.add_shared_coefficient(feature_name="intercept", items_indexes=[0, 1, 2, 3])
hist = model.fit(dataset, get_report=False, verbose=2)
model.trainable_weights
[<tf.Variable 'beta_ich:0' shape=(1, 1) dtype=float32, numpy=array([[-0.5336982]], dtype=float32)>,
 <tf.Variable 'beta_och:0' shape=(1, 1) dtype=float32, numpy=array([[-0.83580786]], dtype=float32)>,
 <tf.Variable 'beta_icca:0' shape=(1, 1) dtype=float32, numpy=array([[-0.22605541]], dtype=float32)>,
 <tf.Variable 'beta_occa:0' shape=(1, 1) dtype=float32, numpy=array([[-1.1159254]], dtype=float32)>,
 <tf.Variable 'income_cooling:0' shape=(1, 1) dtype=float32, numpy=array([[0.24864283]], dtype=float32)>,
 <tf.Variable 'income_room:0' shape=(1, 1) dtype=float32, numpy=array([[-0.36347896]], dtype=float32)>,
 <tf.Variable 'beta_intercept:0' shape=(1, 1) dtype=float32, numpy=array([[-5.644782]], dtype=float32)>,
 <tf.Variable 'gammas_nests:0' shape=(1, 2) dtype=float32, numpy=array([[0.57864326, 0.42460972]], dtype=float32)>]

Fourth Formulation

model = NestedLogit(items_nests=[[0, 1, 2], [3], [4, 5, 6]],
                    optimizer="lbfgs",
                    shared_gammas_over_nests=True)
# Coefficients that are for all the alternatives
model.add_shared_coefficient(feature_name="ich", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="och", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="icca", items_indexes=[0, 1, 2, 3, 4, 5, 6])
model.add_shared_coefficient(feature_name="occa", items_indexes=[0, 1, 2, 3, 4, 5, 6])

# The coefficients concerning the income are split into two groups of alternatives:
model.add_shared_coefficient(feature_name="income", items_indexes=[0, 1, 2, 3], coefficient_name="income_cooling")
model.add_shared_coefficient(feature_name="income", items_indexes=[2, 6], coefficient_name="income_room")

# Finally only one nest has an intercept
model.add_shared_coefficient(feature_name="intercept", items_indexes=[0, 1, 2, 3])
hist = model.fit(dataset, get_report=True, verbose=2)
model.report
Coefficient Name Coefficient Estimation Std. Err z_value P(.>z)
0 beta_ich -0.838395 0.119850 -6.995352 0.000000e+00
1 beta_och -1.331599 0.251385 -5.297053 1.192093e-07
2 beta_icca -0.256129 0.127123 -2.014806 4.392505e-02
3 beta_occa -1.405654 1.152076 -1.220105 2.224251e-01
4 income_cooling 0.311357 0.055357 5.624510 0.000000e+00
5 income_room -0.571352 0.082983 -6.885149 0.000000e+00
6 beta_intercept -10.413467 5.268852 -1.976421 4.810715e-02
7 gamma_nests 0.956544 0.430562 2.221618 2.630913e-02