Introduction to choice-learn's modelling

# Install necessary requirements

# If you run this notebook on Google Colab, or in standalone mode, you need to install the required packages.
# Uncomment the following lines:

# !pip install choice-learn

# If you run the notebook within the GitHub repository, you need to run the following lines, that can skipped otherwise:
import os
import sys

sys.path.append("../../")

import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""

import numpy as np
import pandas as pd

import tensorflow as tf

# Enabling eager execution sometimes decreases fitting time
tf.compat.v1.enable_eager_execution()

Summary

Example 1: ConditionalLogit with Swissmetro
- A few words on c-MNL formulation
- Instantiation and estimation with Choice-Learn
Example 2: ConditionalLogit with ModeCanda

For model customization and more explanation on ChoiceModel and the endpoints, you can go here

Example 1: SwissMetro

The choice-learn package offers a high level API to conceive and estimate discrete choice models. Several models are ready to be used, you can check the list here. If you want to create your own model or another one that is not in the list, the lower level API can help you. Check the notebook here.

Let's begin this tutorial with the estimation of a Conditional Logit Model on the SwissMetro dataset[3]. It follows the specifications described in PyLogit and Biogeme.

First, we download our data as a ChoiceDataset. See the data management tutorial first if needed.

from choice_learn.datasets import load_swissmetro
swiss_dataset = load_swissmetro(preprocessing="tutorial")
print(swiss_dataset.summary())

The Conditional Logit model

The conditional Logit [2] specifies a linear utility for each features of item $i$: $U(i) = \sum_{features} a(i) * feature(i)$ The probability to choose $i$ among the set of available alternatives $\mathcal{A}$ is then:

$\mathbb{P}(i) = \frac{e^{U(i)}}{\sum_{j \in \mathcal{A}} e^{U(j)}}$

With the SwissMetro dataset we are trying to predict a customer mean of transport among train, swissmetro and car from the features: - TT (transit time) - CO (cost) - HE (headway) - Survey (where the survey took place) - Luggage number - seats configuration in the swissmetro - first class or not

An important step is to define the right utility function for the model to fit well the dataset. Let's take the following formulation defined in PyLogit:

$ U(train) = \beta_{train}^{inter} + \beta^{tt}{train/sm} \cdot TT(train) + \beta^{co} \cdot SV(train) $} \cdot CO(train) + \beta^{he}_{train} \cdot HE(train) + \beta^{survey
$ U(sm) = \beta_{sm}^{inter} + \beta^{tt}{train/sm} \cdot TT(sm) + \beta^{co} \cdot FC(sm) $} \cdot CO(sm) + \beta^{he}_{sm} \cdot HE(sm) + \beta^{survey} \cdot SV(sm) + \beta^{seat} \cdot SEAT(sm) + \beta^{first_class
$ U(car) = \beta^{tt}{car} \cdot TT(car) + \beta^{co}} \cdot CO(car) + \beta^{luggage==1} \cdot \mathbb{1{Luggage==1} + \beta^{luggage>1} \cdot \mathbb{1} $

Note that we want to estimate:

one $\beta^{tt}{train/sm}$ shared for train and sm items and one $\beta^{tt}$ for the car item. Indeed, one can argue that customers have the same sensibility toward travel time for all public transport and a different one for private ones.
one $\beta^{co}$ coefficient for each item.
one $\beta^{inter}$ and $\beta^{he}$ for train and sm, and zeroed for the car alternative
one $\beta^{survey}$, $\beta^{seat}$, $\beta^{first_class}$, $\beta^{luggage==1}$ and $\beta^{luggage>1}$ shared or not by different items

To build a model, we need to specify for each weight $\beta$: - the name of the feature it goes with: - it must match the feature name in the ChoiceDataset - "intercept" is the standardized name used for intercept, pay attention not to override it - items_indexes: the items concerned, as indexed in the ChoiceDataset - (optionally) a unique weight name

Attention

add_coefficients is to be used to get one coefficient by given items_indexes\

add_shared_coefficients is to be used to get on coefficient that is used for utility of all given items_indexes

Here is how to create a model following our defined utility functions:

Conditional Logit Estimation with Choice-Learn

from choice_learn.models import ConditionalLogit

# Initialization of the model
swiss_model = ConditionalLogit(optimizer="lbfgs")

# Intercept for train & sm
swiss_model.add_coefficients(feature_name="intercept", items_indexes=[0, 1])
# beta_he for train & sm
swiss_model.add_coefficients(feature_name="headway",
                             items_indexes=[0, 1],
                             coefficient_name="beta_he")
# beta_co for all items
swiss_model.add_coefficients(feature_name="cost",
                             items_indexes=[0, 1, 2])
# beta first_class for train
swiss_model.add_coefficients(feature_name="regular_class",
                             items_indexes=[0])
# beta seats for train
swiss_model.add_coefficients(feature_name="seats", items_indexes=[1])
# betas luggage for car
swiss_model.add_coefficients(feature_name="single_luggage_piece",
                             items_indexes=[2],
                             coefficient_name="beta_luggage=1")
swiss_model.add_coefficients(feature_name="multiple_luggage_piece",
                             items_indexes=[2],
                             coefficient_name="beta_luggage>1")
# beta TT only for car
swiss_model.add_coefficients(feature_name="travel_time",
                             items_indexes=[2],
                             coefficient_name="beta_tt_car")

# betas TT and HE shared by train and sm
swiss_model.add_shared_coefficient(feature_name="travel_time",
                                   items_indexes=[0, 1])
swiss_model.add_shared_coefficient(feature_name="train_survey",
                                   items_indexes=[0, 1],
                                   coefficient_name="beta_survey")

# Estimation of the model
history = swiss_model.fit(swiss_dataset, get_report=True)

Once the model is estimated, we can look at the weights with the .trainable_weights argument:

swiss_model.trainable_weights

[<tf.Variable 'beta_intercept:0' shape=(1, 2) dtype=float32, numpy=array([[-1.2929296, -0.5025745]], dtype=float32)>,
 <tf.Variable 'beta_he:0' shape=(1, 2) dtype=float32, numpy=array([[-0.31433573, -0.37731653]], dtype=float32)>,
 <tf.Variable 'beta_cost:0' shape=(1, 3) dtype=float32, numpy=array([[-0.5617626 , -0.2816758 , -0.51384646]], dtype=float32)>,
 <tf.Variable 'beta_regular_class:0' shape=(1, 1) dtype=float32, numpy=array([[0.5650171]], dtype=float32)>,
 <tf.Variable 'beta_seats:0' shape=(1, 1) dtype=float32, numpy=array([[-0.78244746]], dtype=float32)>,
 <tf.Variable 'beta_luggage=1:0' shape=(1, 1) dtype=float32, numpy=array([[0.42275968]], dtype=float32)>,
 <tf.Variable 'beta_luggage>1:0' shape=(1, 1) dtype=float32, numpy=array([[1.4139789]], dtype=float32)>,
 <tf.Variable 'beta_tt_car:0' shape=(1, 1) dtype=float32, numpy=array([[-0.7229831]], dtype=float32)>,
 <tf.Variable 'beta_travel_time:0' shape=(1, 1) dtype=float32, numpy=array([[-0.69901365]], dtype=float32)>,
 <tf.Variable 'beta_survey:0' shape=(1, 1) dtype=float32, numpy=array([[2.542475]], dtype=float32)>]

We can easily acces the negative log likelihood value for the training dataset or another one using the .evaluate() method:

len(swiss_dataset) * swiss_model.evaluate(swiss_dataset)

<tf.Tensor: shape=(), dtype=float32, numpy=5159.3047>

If you set get_report to True in .fit, the model automatically creates a report for each of the coefficient, with its estimation, its standard deviation and more:

# Let's add items for better readability
items = ["train", "sm", "train", "sm", "train", "sm", "car", "train", "sm", "car", "car", "car", "train & sm", "train & sm"]
swiss_model.report = pd.concat([swiss_model.report, pd.Series(items, name="item")], axis=1)

swiss_model.report

	Coefficient Name	Coefficient Estimation	Std. Err	z_value	P(.>z)	item
0	beta_intercept_0	-1.292930	0.145000	-8.916755	0.000000e+00	train
1	beta_intercept_1	-0.502575	0.109712	-4.580845	4.649162e-06	sm
2	beta_he_0	-0.314336	0.064734	-4.855812	1.192093e-06	train
3	beta_he_1	-0.377317	0.191370	-1.971662	4.864824e-02	sm
4	beta_cost_0	-0.561763	0.094500	-5.944569	0.000000e+00	train
5	beta_cost_1	-0.281676	0.042252	-6.666523	0.000000e+00	sm
6	beta_cost_2	-0.513846	0.101592	-5.057931	4.768372e-07	car
7	beta_regular_class	0.565017	0.079958	7.066461	0.000000e+00	train
8	beta_seats	-0.782447	0.085864	-9.112632	0.000000e+00	sm
9	beta_luggage=1	0.422760	0.063327	6.675845	0.000000e+00	car
10	beta_luggage>1	1.413979	0.190378	7.427222	0.000000e+00	car
11	beta_tt_car	-0.722983	0.044959	-16.080963	0.000000e+00	car
12	beta_travel_time	-0.699014	0.038635	-18.092785	0.000000e+00	train & sm
13	beta_survey	2.542475	0.098162	25.900934	0.000000e+00	train & sm

We find the same results (estimation of parameters and negative log-likelihood) as the PyLogit package. One can easily interpret the coefficient. For example $\beta_{cost}$ represent the average price elasticity of the customers. First, it's negative meaning that the most expensive it is the less likely the alternative will be chose, second we can observe that the elasticity is smaller for the SwissMetro. It is not suprising for a "premium" product, meaning that people choosing this alternative agreed to pay more for comfort for example and are less regarding on the price.

We can also observe that $\beta_{luggage>1} > \beta_{luggage=1} > \beta_{luggage=0} = 0$ meaning that customers with luggae are more likely to use their car for transport, and it is event further the case if they have more than one piece of luggage.

Example #2: the ModeCanada Dataset

Utility formulation

Let's reproduce a common example from Torch-Choice on the ModeCanada [1] dataset: $U(i, c) = \beta^{inter}_i + \beta^{price} \cdot price(i, c) + \beta^{freq} \cdot freq(i, c) + \beta^{ovt} \cdot ovt(i, c) + \beta^{income}_i \cdot income(c) + \beta^{ivt}_i \cdot ivt(i, c) + \epsilon(i, c)$

A description of the dataset and the features can be found in [1]. We want to predict the future mean of transport from [train, air, bus, car] using frequence, price, in-vehicule transport time (ivt) and out-of-vehicule transport time (ovt).

Model Formulation

# If you want to check what's in the dataset:
from choice_learn.datasets import load_modecanada

transport_df = load_modecanada(as_frame=True)
transport_df.head()

	Unnamed: 0	case	alt	choice	dist	cost	ivt	ovt	freq	income	noalt
0	1	1	train	0	83	28.25	50	66	4	45.0	2
1	2	1	car	1	83	15.77	61	0	0	45.0	2
2	3	2	train	0	83	28.25	50	66	4	25.0	2
3	4	2	car	1	83	15.77	61	0	0	25.0	2
4	5	3	train	0	83	28.25	50	66	4	70.0	2

We want to estimate:

one $\beta^{price}$, $\beta^{freq}$ and $\beta^{ovt}$ coefficient. They are shared by all items.
one $\beta^{ivt}$ coefficient for each item.
one $\beta^{inter}$ and $\beta^{income}$ coefficient for each item, with additional constraint to be 0 for the first item (air).

One notes that it makes sense to include an intercept $\beta^{inter}$ for each item since $ivt(i, c)$ and $income(c)$ depends on each choice $c$.

Additionally to previous example we manually specify the weights names:

# Loading the ChoiceDataset
canada_dataset = load_modecanada(as_frame=False, preprocessing="tutorial")

print(canada_dataset.summary())

%=====================================================================%
%%% Summary of the dataset:
%=====================================================================%
Number of items: 4
Number of choices: 2779
%=====================================================================%
 Shared Features by Choice:
 1 shared features
 with names: (['income'],)


 Items Features by Choice:
4 items features 
 with names: (['cost', 'freq', 'ovt', 'ivt'],)
%=====================================================================%

from choice_learn.models import ConditionalLogit

# Initialization of the model
model = ConditionalLogit()

# Creation of the different weights:

# shared_coefficient add one coefficient that is used for all items specified in the items_indexes:
# Here, cost, freq and ovt coefficients are shared between all items
model.add_shared_coefficient(feature_name="cost", items_indexes=[0, 1, 2, 3])
# You can specify you own coefficient name
model.add_shared_coefficient(feature_name="freq",
                             coefficient_name="beta_frequence",
                             items_indexes=[0, 1, 2, 3])
model.add_shared_coefficient(feature_name="ovt", items_indexes=[0, 1, 2, 3])

# ivt is added for each item:
model.add_coefficients(feature_name="ivt", items_indexes=[0, 1, 2, 3])

# add_coefficients adds one coefficient for each specified item_index
# intercept, and income are added for each item except the first one that needs to be zeroed
model.add_coefficients(feature_name="intercept", items_indexes=[1, 2, 3])
model.add_coefficients(feature_name="income", items_indexes=[1, 2, 3])

history = model.fit(canada_dataset, get_report=True, verbose=2)

model.trainable_weights

[<tf.Variable 'beta_cost:0' shape=(1, 1) dtype=float32, numpy=array([[-0.03333882]], dtype=float32)>,
 <tf.Variable 'beta_frequence:0' shape=(1, 1) dtype=float32, numpy=array([[0.09252929]], dtype=float32)>,
 <tf.Variable 'beta_ovt:0' shape=(1, 1) dtype=float32, numpy=array([[-0.04300351]], dtype=float32)>,
 <tf.Variable 'beta_ivt:0' shape=(1, 4) dtype=float32, numpy=
 array([[ 0.05950943, -0.00678369, -0.00646029, -0.00145037]],
       dtype=float32)>,
 <tf.Variable 'beta_intercept:0' shape=(1, 3) dtype=float32, numpy=array([[0.6983663, 1.8440988, 3.2741835]], dtype=float32)>,
 <tf.Variable 'beta_income:0' shape=(1, 3) dtype=float32, numpy=array([[-0.08908683, -0.02799302, -0.03814651]], dtype=float32)>]

print("The average neg-loglikelihood is:", model.evaluate(canada_dataset).numpy())
print("The total neg-loglikelihood is:", model.evaluate(canada_dataset).numpy()*len(canada_dataset))

The average neg-loglikelihood is: 0.67447394
The total neg-loglikelihood is: 1874.3630829453468

model.report

	Coefficient Name	Coefficient Estimation	Std. Err	z_value	P(.>z)
0	beta_cost	-0.033339	0.007095	-4.699046	2.622604e-06
1	beta_frequence	0.092529	0.005097	18.151926	0.000000e+00
2	beta_ovt	-0.043004	0.003225	-13.335685	0.000000e+00
3	beta_ivt_0	0.059509	0.010073	5.908032	0.000000e+00
4	beta_ivt_1	-0.006784	0.004433	-1.530138	1.259825e-01
5	beta_ivt_2	-0.006460	0.001898	-3.403067	6.663799e-04
6	beta_ivt_3	-0.001450	0.001187	-1.221395	2.219366e-01
7	beta_intercept_0	0.698366	1.280191	0.545517	5.853977e-01
8	beta_intercept_1	1.844099	0.708427	2.603088	9.238839e-03
9	beta_intercept_2	3.274184	0.624342	5.244213	1.192093e-07
10	beta_income_0	-0.089087	0.018347	-4.855648	1.192093e-06
11	beta_income_1	-0.027993	0.003872	-7.228693	0.000000e+00
12	beta_income_2	-0.038147	0.004083	-9.342735	0.000000e+00

Faster Specification

A faster specification can be done using a dictionnary. It follows torch-choice method to create conditional logit models. The parameters dict needs to be as follows: - The key is the feature name - The value is the mode. Currently three modes are available: - constant: the learned coefficient is shared by all items - item: one coefficient by item is estimated, the value for the item at index 0 is set to 0 - item-full: one coefficient by item is estimated

In order to create the same model for the ModeCanada dataset, it looks as follows:

# Instantiation with the coefficients dictionnary
coefficients = {"income": "item",
 "cost": "constant",
 "freq": "constant",
 "ovt": "constant",
 "ivt": "item-full",
 "intercept": "item"}

# Instantiation of the model
cmnl = ConditionalLogit(coefficients=coefficients, epochs=1000, optimizer="lbfgs")

history = cmnl.fit(canada_dataset)
print(cmnl.trainable_weights)
print(cmnl.evaluate(canada_dataset).numpy())

[<tf.Variable 'income_w_0:0' shape=(1, 3) dtype=float32, numpy=array([[-0.08908679, -0.02799301, -0.0381465 ]], dtype=float32)>, <tf.Variable 'cost_w_1:0' shape=(1, 1) dtype=float32, numpy=array([[-0.03333884]], dtype=float32)>, <tf.Variable 'freq_w_2:0' shape=(1, 1) dtype=float32, numpy=array([[0.09252931]], dtype=float32)>, <tf.Variable 'ovt_w_3:0' shape=(1, 1) dtype=float32, numpy=array([[-0.04300352]], dtype=float32)>, <tf.Variable 'ivt_w_4:0' shape=(1, 4) dtype=float32, numpy=
array([[ 0.05950936, -0.00678372, -0.0064603 , -0.00145037]],
      dtype=float32)>, <tf.Variable 'intercept_w_5:0' shape=(1, 3) dtype=float32, numpy=array([[0.69836473, 1.8440932 , 3.2741785 ]], dtype=float32)>]
0.67447394

Comparison with other implementations results

import tensorflow as tf

# Here are the values obtained in the references:
gt_weights = [
    tf.constant([[-0.0890796, -0.0279925, -0.038146]]),
    tf.constant([[-0.0333421]]),
    tf.constant([[0.0925304]]),
    tf.constant([[-0.0430032]]),
    tf.constant([[0.0595089, -0.00678188, -0.00645982, -0.00145029]]),
    tf.constant([[0.697311, 1.8437, 3.27381]]),
]
gt_model = ConditionalLogit(coefficients=coefficients)
gt_model.instantiate(canada_dataset)
canada_dataset
# Here we estimate the negative log-likelihood with these coefficients (also, we obtain same value as in those papers):
gt_model._trainable_weights = gt_weights
print("'Ground Truth' Negative LogLikelihood:", gt_model.evaluate(canada_dataset) * len(canada_dataset))

Using L-BFGS optimizer, setting up .fit() function
'Ground Truth' Negative LogLikelihood: tf.Tensor(1874.3633, shape=(), dtype=float32)

Estimate Utility & probabilities

In order to estimate the utilities, use the .predict_utility() method. In order to estimate the probabilities, use the .predict_probas() method.

print("Utilities of each item for the first 5 sessions:", cmnl.compute_batch_utility(*canada_dataset.batch[:5]))

Utilities of each item for the first 5 sessions: tf.Tensor(
[[ -4.250798   -8.238913   -3.4962516  -3.5083752]
 [ -4.250798  -10.466084   -4.196077   -4.4620376]
 [ -4.250798   -7.348045   -3.2163215  -3.1269102]
 [ -4.250798  -10.466084   -4.196077   -4.4620376]
 [ -4.250798  -10.466084   -4.196077   -4.4620376]], shape=(5, 4), dtype=float32)

print("Purchase probability of each item for the first 5 sessions:", cmnl.predict_probas(canada_dataset)[:5])

Purchase probability of each item for the first 5 sessions: tf.Tensor(
[[0.1906133  0.00353295 0.40536723 0.40048242]
 [0.348695   0.00069692 0.3683078  0.2822966 ]
 [0.14418297 0.00651324 0.40567806 0.44362125]
 [0.348695   0.00069692 0.3683078  0.2822966 ]
 [0.348695   0.00069692 0.3683078  0.2822966 ]], shape=(5, 4), dtype=float32)

Using Gradient Descent Optimizers

For very large datasets that do not fit entirely in the memory, we have to work with data batches. In this case, the LBFGS method cannot be used, because of the large memory usage of the algorithm. In those cases, we will prefer stochastic gradient descent optimizers.

In this case, it is possible to obtain the same coefficients estimation, also it is a little tricky to get it quickly. We need to adjust the learning rate over time for the optimization not to be too slow. L-BFGS is more efficient for small dataset - Gradient Descent for large ones !

cmnl = ConditionalLogit(coefficients=coefficients, optimizer="Adam", epochs=2000, batch_size=-1)
history = cmnl.fit(canada_dataset)
cmnl.optimizer.lr = cmnl.optimizer.lr / 5
cmnl.epochs = 4000
history2 = cmnl.fit(canada_dataset)
cmnl.optimizer.lr = cmnl.optimizer.lr  / 10
cmnl.epochs = 20000
history3 = cmnl.fit(canada_dataset)

Epoch 1999 Train Loss 0.6801: 100%|██████████| 2000/2000 [00:11<00:00, 176.83it/s]
Epoch 3999 Train Loss 0.6776: 100%|██████████| 4000/4000 [00:20<00:00, 198.77it/s]
Epoch 19999 Train Loss 0.6767: 100%|██████████| 20000/20000 [01:38<00:00, 202.55it/s]

It can be useful to look at the loss (negative loglikelihood) over time to see how the estimation goes:

import matplotlib.pyplot as plt
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history["train_loss"])
plt.title("First part of the gradient descent.")
plt.subplot(1, 2, 2)
plt.plot(history2["train_loss"] + history3["train_loss"])
plt.title("Second and third part of the gradient descent.")

Text(0.5, 1.0, 'Second and third part of the gradient descent.')

png

cmnl.trainable_weights

[<tf.Variable 'income_w_0:0' shape=(1, 3) dtype=float32, numpy=array([[-0.08402919, -0.02359896, -0.03233599]], dtype=float32)>,
 <tf.Variable 'cost_w_1:0' shape=(1, 1) dtype=float32, numpy=array([[-0.05140902]], dtype=float32)>,
 <tf.Variable 'freq_w_2:0' shape=(1, 1) dtype=float32, numpy=array([[0.09645309]], dtype=float32)>,
 <tf.Variable 'ovt_w_3:0' shape=(1, 1) dtype=float32, numpy=array([[-0.04099115]], dtype=float32)>,
 <tf.Variable 'ivt_w_4:0' shape=(1, 4) dtype=float32, numpy=
 array([[ 0.05871323, -0.00726116, -0.00368665, -0.00105638]],
       dtype=float32)>,
 <tf.Variable 'intercept_w_5:0' shape=(1, 3) dtype=float32, numpy=array([[-1.687437  , -0.39638832,  1.1344599 ]], dtype=float32)>]

cmnl.evaluate(canada_dataset)

<tf.Tensor: shape=(), dtype=float32, numpy=0.67665595>

References

[1] ModeCanada dataset in Application and interpretation of nested logit models of intercity mode choice, Christophier, V. F.; Koppelman, S. (1993)\ [2] Conditional MultinomialLogit, Train, K.; McFadden, D.; Ben-Akiva, M. (1987)\ [3] Siwssmetro dataset in The acceptance of modal innovation: The case of Swissmetro, Bierlaire, M.; Axhausen, K.; Abay, G (2001)\