# Install necessary requirements
# If you run this notebook on Google Colab, or in standalone mode, you need to install the required packages.
# Uncomment the following lines:
# !pip install choice-learn
# If you run the notebook within the GitHub repository, you need to run the following lines, that can skipped otherwise:
import os
import sys
sys.path.append("../../")
Introduction to customization
The Choice-Learn package aims at providing structure and helpful functions in order to design any choice model. The main idea is to write the utility function and let the package work its magic. It is recommended to read the data tutorial before to understand the ChoiceDataset class.
Summary
- BaseClass: ChoiceModel
- Example 1: Rewriting Conditional Logit as custom model
- Example 2: Defining a non-linear utility function with TensorFlow
BaseClass: ChoiceModel
Choice-Learn models are built on the ChoiceModel base class and most of them follow the same structure.\ In this tutorial, we will delve into the details of modelling and the possibilities of the package. In particular we will see how Choice-Learn helps for manual formulation of a choice model.
The different EndPoints
The ChoiceModel class revolves around several methods that are shared by most models: - Model Specification\ .__init__() and/or .instantiate() are used to specify the form of the model
-
Model Estimation\ .fit() uses a ChoiceDataset to find the best values for the different trainable weights
-
Use of the model\ .evaluate() can be used to estimate the negative log likelihood of the model's choice probabilities compared to the ground truth from a ChoiceDataset\ .predict_probas() can be used to predict the model's choice probabilities related to a ChoiceDataset\ .compute_batch_utility() can be used to predict a batch items utilities
Parameters
A few parameters are shared through the ChoiceModel class and can be changed. A full list is available, here are the most useful:
- optimizer: Name of the optimizer to use. Default is lbfgs
- Non-stochastic: It is recommended to use them - and in particular lbfgs - for smaller datasets and models. It is faster but needs all data in memory, therefore the batch_size argument is not used. More info on the TensorFlow documentation.
- Stochastic Gradient Descent optimizers - such as Adam. They will lead to slower convergence but work well with batching. List is here.
- batch_size: Data batch size to use when stochastic gradient descent optimizer is used. Default is 32.
- lr: Learning Rate of the optimizer to use when stochastic gradient descent optimizer is used. Default is 0.001.
- epochs: Max number of iterations before stopping optimization. Default is 1000.
Subclassing
Inheritance is used for better code formatting in Choice-Learn. It is also optimized to let anyone easily define its own utility model. The idea is that by subclassing ChoiceModel one only needs to define the utility function with TensorFlow for it to work.\ The advantages are twofold: - It needs little time. An example will follow to show you how it can be done in a few minutes. - It is possible to use non-linear formulations of the utility. As long as it is written with TensorFlow operations, Choice-Learn and TensorFlow handle the optimization. For the more adventurers, you can even define your own operations as long as you provide the gradients.
Example 1: Rewriting the conditional MNL on ModeCanada
We download the ModeCanada dataset as a ChoiceDataset, see here for more details.
import numpy as np
import pandas as pd
from choice_learn.datasets import load_modecanada
dataset = load_modecanada(as_frame=False, preprocessing="tutorial", add_items_one_hot=False)
We will subclass the parent class ChoiceModel that we need to import. It mainly works with TensorFlow as a backend, it is thus recommended to use their operation as much as possible. Most NumPy operations have a TensorFlow equivalent. You can look at the documentation here.
For our custom model to work, we need to specify: - Weights initialization in init() - the utility function in compute_batch_utility()
Utility formulation
Following the Conditional Logit tutorial we want to estimate the following utility function: You can check the cLogit example for more details
Coefficients Initialization
Following our utility formula we need four coefficients vectors: - $\beta^{inter}$ has 3 values - $\beta^{price}$, $\beta^{freq}$, $\beta^{ovt}$ are regrouped and each has one value, shared by all items - $\beta^{income}$ has 3 values - $\beta^{ivt}$ has 4 values
Utility Computation
In the method compute_utility, we need to define how to estimate each item utility for each choice using the features and initialized weights. The arguments of the function are a batch of each features type of the ChoiceDataset class:
Order | Argument | shape | Features for ModeCanada |
---|---|---|---|
2 | shared_features_by_choice | (batch_size, n_shared_features) | Customer Income |
3 | items_features_by_choice | (batch_size, n_items, n_items_features) | Cost, Freq, Ivt, Ovt values of each mode |
4 | available_items_by_choice | (batch_size, n_items) | Not Used |
5 | choices | (batch_size, ) | Not Used |
batch_size represents the number of choices given in the batch. The method needs to return the utilities, in the form of a matrix of shape (n_choices, n_items), representing the utility of each item for each choice.
# You can verify the names and order of the features:
print(dataset.shared_features_by_choice_names, dataset.items_features_by_choice_names)
class CustomCanadaConditionalLogit(ChoiceModel):
"""Conditional Logit following for ModeCanada.
Arguments:
----------
optimizer : str
tf.keras.optimizer to use for training, default is Adam
lr: float
learning rate for optimizer, default is 1e-3
"""
def __init__(
self,
add_exit_choice=False, # Whether to add exit choice with utility=1
optimizer="lbfgs", # Optimizer to use
tolerance=1e-8, # Absolute function tolerance for optimality if lbfgs is used
lr=0.001, # learning rate if stochastic gradient descent optimizer
epochs=1000, # maximum number of epochs
batch_size=32, # batch size if stochastic gradient descent optimizer
):
"""Model coefficients instantiation."""
super().__init__(add_exit_choice=add_exit_choice,
optimizer=optimizer,
tolerance=tolerance,
lr=lr,
epochs=epochs,
batch_size=batch_size)
# Create model weights. Basically is one weight by feature + one for intercept
self.beta_inter = tf.Variable(tf.random_normal_initializer(0.0, 0.02, seed=42)(shape=(1, 3)),
name="beta_inter")
self.beta_freq_cost_ovt = tf.Variable(
tf.random_normal_initializer(0.0, 0.02, seed=42)(shape=(1, 3)),
name="beta_freq_cost_ovt"
)
self.beta_income = tf.Variable(tf.random_normal_initializer(0.0, 0.02, seed=42)(shape=(1, 3)),
name="beta_income")
self.beta_ivt = tf.Variable(tf.random_normal_initializer(0.0, 0.02, seed=42)(shape=(1, 4)),
name="beta_ivt")
# Do not forget to add them to the list of trainable_weights, it is mandatory !
@property
def trainable_weights(self):
"""Do not forget to add the weights to the list of trainable_weights.
It is needed to use the @property definition as here.
Return:
-------
list:
list of tf.Variable to be optimized
"""
return [self.beta_inter, self.beta_freq_cost_ovt, self.beta_income, self.beta_ivt]
def compute_batch_utility(self,
shared_features_by_choice,
items_features_by_choice,
available_items_by_choice,
choices):
"""Method that defines how the model computes the utility of a product.
Parameters
----------
shared_features_by_choice : tuple of np.ndarray (choices_features)
a batch of shared features
Shape must be (n_choices, n_shared_features)
items_features_by_choice : tuple of np.ndarray (choices_items_features)
a batch of items features
Shape must be (n_choices, n_items_features)
available_items_by_choice : np.ndarray
A batch of items availabilities
Shape must be (n_choices, n_items)
choices_batch : np.ndarray
Choices
Shape must be (n_choices, )
Returns:
--------
np.ndarray
Utility of each product for each choice.
Shape must be (n_choices, n_items)
"""
_ = (available_items_by_choice, choices) # Avoid unused variable warning
# Adding the 0 value intercept of first item to get the right shape
full_beta_inter = tf.concat([tf.constant([[.0]]), self.beta_inter], axis=-1)
# Concatenation to reach right shape for dot product
full_beta_income = tf.concat([tf.constant([[.0]]), self.beta_income], axis=-1) # shape = (1, n_items)
items_ivt_by_choice = items_features_by_choice[:, :, 3] # shape = (n_choices, n_items, )
items_cost_freq_ovt_by_choice = items_features_by_choice[:, :, :3 ]# shape = (n_choices, n_items, 3)
u_cost_freq_ovt = tf.squeeze(tf.tensordot(items_cost_freq_ovt_by_choice,
tf.transpose(self.beta_freq_cost_ovt), axes=1)) # shape = (n_choices, n_items)
u_ivt = tf.multiply(items_ivt_by_choice, self.beta_ivt) # shape = (n_choices, n_items)
u_income = tf.tensordot(shared_features_by_choice, full_beta_income, axes=1) # shape = (n_choices, n_items)
# Reshaping the intercept that is constant over all choices (n_items, ) -> (n_choices, n_items)
u_intercept = tf.concat([full_beta_inter] * (u_income.shape[0]), axis=0)
return u_intercept + u_cost_freq_ovt + u_income + u_ivt
Decomposition of the utility operations
Intercept
- $U_{inter}[air, s] = \beta^{inter}_{air} = 0$
- $U_{inter}[bus, s] = \beta^{inter}_{bus}$
- $U_{inter}[car, s] = \beta^{inter}_{car}$
- $U_{inter}[train, s] = \beta^{inter}_{train}$
$\beta^{inter} = \left(\right)$
$U_{inter} = \beta^{inter.T}$
Price, Freq, OVT
- $U_{price, freq, ovt}[air, s] = \beta^{price} \cdot price[air, s] + \beta^{freq} \cdot freq[air, s] + \beta^{ovt} \cdot ovt[air, s]$
- $U_{price, freq, ovt}[bus, s] = \beta^{price} \cdot price[bus, s] + \beta^{freq} \cdot freq[bus, s] + \beta^{ovt} \cdot ovt[bus, s]$
- $U_{price, freq, ovt}[car, s] = \beta^{price} \cdot price[car, s) + \beta^{freq} \cdot freq[car, s] + \beta^{ovt} \cdot ovt(car, s]$
- $U_{price, freq, ovt}[train, s] = \beta^{price} \cdot price[train, s] + \beta^{freq} \cdot freq[train, s] + \beta^{ovt} \cdot ovt[train, s]$
$\beta^{price, freq, ovt} = \left(\right)$ and $items_feature_by_choice[0, :3] = \left(\right)$
$U_{price, freq, ovt} = \beta^{price, freq, ovt .T} \cdot items_feature_by_choice[:, :3]$
Note that in the matrix we didn't illustrate the choices dimension, explaining the [0, :3] -> [:, :3]. items_features_by_choice[:, :3] has a shape of (batch_size, 4, 3) and $ \beta^{price, freq, ovt}$ a shape of (1, 3). Resulting $U_{price, freq, ovt} $ has therefore a shape of (batch_size, 4)
IVT
- $U_{ivt}[air, s] = \beta^{ivt}_{air} \cdot ivt[air, s]$
- $U_{ivt}[bus, s] = \beta^{ivt}_{bus} \cdot ivt[bus, s]$
- $U_{ivt}[car, s] = \beta^{ivt}_{car} \cdot ivt[car, s]$
- $U_{ivt}[train, s] = \beta^{ivt}_{train} \cdot ivt[train, s]$
$\beta^{ivt} = \left(\right)$\ and\ $items_features_by_choice[:, 3] = \left(\right)$
$U_{ivt} = \beta^{ivt} * items_features_by_choice[:, 3]$ of shape (batch_size, 4)
Income
- $U_{income}[air, s] = \beta^{income}_{air} \cdot income[s]$
- $U_{income}[bus, s] = \beta^{income}_{bus} \cdot income[s]$
- $U_{income}[car, s] = \beta^{income}_{car} \cdot income[s]$
- $U_{income}[train, s] = \beta^{income}_{train} \cdot income[s]$
$\beta^{income} = \left(\right)$ and $shared_features = \left(\right)$
$U_{income} = \beta^{income .T} \cdot shared_features$
By concatenating batch_size times $U_{inter}$ over the choices we obtain 4 matrixes of shape (batch_size, 4).
The final utility is then: $U = U_{inter} + U_{price, freq, ovt} + U_{ivt} + U_{income}$
Results
We can now test that we obtain the same results:
print(model.trainable_weights[0])
print(model.trainable_weights[1])
print(model.trainable_weights[2])
print(model.trainable_weights[3])
<tf.Variable 'beta_inter:0' shape=(1, 3) dtype=float32, numpy=array([[0.6983521, 1.8441089, 3.2741907]], dtype=float32)>
<tf.Variable 'beta_freq_cost_ovt:0' shape=(1, 3) dtype=float32, numpy=array([[-0.03333881, 0.09252932, -0.0430035 ]], dtype=float32)>
<tf.Variable 'beta_income:0' shape=(1, 3) dtype=float32, numpy=array([[-0.08908677, -0.02799308, -0.03814653]], dtype=float32)>
<tf.Variable 'beta_ivt:0' shape=(1, 4) dtype=float32, numpy=
array([[ 0.05950957, -0.0067836 , -0.00646028, -0.00145035]],
dtype=float32)>
The coefficients are organized differently but reach the same values. It is also the case for negative log-lilkelihood:
Total Neg LikeliHood; tf.Tensor(1874.363, shape=(), dtype=float32)
Example 2: Defining a non-linear utility function with TensorFlow
In this example we have used a simple linear function for utility computation. We could use any function we would like. Particularly we can use neural networks and activation functions to add non-linearities.
A simple example would be:
from tensorflow.keras.layers import Dense
class NeuralNetUtility(ChoiceModel):
def __init__(self, n_neurons, **kwargs):
super().__init__(**kwargs)
self.n_neurons = n_neurons
# Items Features Layer
self.dense_items_features = Dense(units=n_neurons, activation="elu")
# Shared Features Layer
self.dense_shared_features = Dense(units=n_neurons, activation="elu")
# Third layer: embeddings to utility (dense representation of features > U)
self.final_layer = Dense(units=1, activation="linear")
# We do not forget to specify self.trainable_weights with all coefficients that need to be estimated.
# Small trick using @property to acces future weights of layers
# that have not been instantiated yet !
@property
def trainable_weights(self):
"""Endpoint to acces model's trainable_weights.
Returns:
--------
list
list of trainable_weights
"""
return self.dense_items_features.trainable_variables\
+ self.dense_shared_features.trainable_variables\
+ self.final_layer.trainable_variables
def compute_batch_utility(self,
shared_features_by_choice,
items_features_by_choice,
available_items_by_choice,
choices):
"""Computes batch utility from features."""
_, _ = available_items_by_choice, choices
# We apply the neural network to all items_features_by_choice for all the items
# We then concatenate the utilities of each item of shape (n_choices, 1) into a single one of shape (n_choices, n_items)
shared_features_embeddings = self.dense_shared_features(shared_features_by_choice)
items_features_embeddings = []
for i in range(items_features_by_choice[0].shape[1]):
# Utility is Dense(embeddings sum)
item_embedding = shared_features_embeddings + self.dense_items_features(items_features_by_choice[:, i])
items_features_embeddings.append(self.final_layer(item_embedding))
# Concatenation to get right shape (n_choices, n_items, )
item_utility_by_choice = tf.concat(items_features_embeddings, axis=1)
return item_utility_by_choice
If you want more complex examples, you can look at the following implementations: - RUMnet