import os

# Remove GPU use
os.environ["CUDA_VISIBLE_DEVICES"] = ""

import sys

sys.path.append("../../")

Introduction to customization

The Choice-Learn package aims at providing structure and helpful functions in order to design any choice model. The main idea is to write the utility function and let the package work its magic. It is recommended to read the data tutorial before to understand the ChoiceDataset class.

Summary

BaseClass: ChoiceModel
Example 1: Rewriting Conditional Logit as custom model
Example 2: Defining a non-linear utility function with TensorFlow

BaseClass: ChoiceModel

Choice-Learn models are built on the ChoiceModel base class and most of them follow the same structure.\ In this tutorial, we will delve into the details of modelling and the possibilities of the package. In particular we will see how Choice-Learn helps for manual formulation of a choice model.

# Let's import ChoiceModel

from choice_learn.models.base_model import ChoiceModel

The different EndPoints

The ChoiceModel class revolves around several methods that are shared by most models: - Model Specification\ .__init__() and/or .instantiate() are used to specify the form of the model

Model Estimation\ .fit() uses a ChoiceDataset to find the best values for the different trainable weights
Use of the model\ .evaluate() can be used to estimate the negative log likelihood of the model's choice probabilities compared to the ground truth from a ChoiceDataset\ .predict_probas() can be used to predict the model's choice probabilities related to a ChoiceDataset\ .compute_batch_utility() can be used to predict a batch items utilities

Parameters

A few parameters are shared through the ChoiceModel class and can be changed. A full list is available, here are the most useful:

optimizer: Name of the optimizer to use. Default is lbfgs
- Non-stochastic: It is recommended to use them - and in particular lbfgs - for smaller datasets and models. It is faster but needs all data in memory, therefore the batch_size argument is not used. More info on the TensorFlow documentation.
- Stochastic Gradient Descent optimizers - such as Adam. They will lead to slower convergence but work well with batching. List is here.
batch_size: Data batch size to use when stochastic gradient descent optimizer is used. Default is 32.
lr: Learning Rate of the optimizer to use when stochastic gradient descent optimizer is used. Default is 0.001.
epochs: Max number of iterations before stopping optimization. Default is 1000.

Subclassing

Inheritance is used for better code formatting in Choice-Learn. It is also optimized to let anyone easily define its own utility model. The idea is that by subclassing ChoiceModel one only needs to define the utility function with TensorFlow for it to work.\ The advantages are twofold: - It needs little time. An example will follow to show you how it can be done in a few minutes. - It is possible to use non-linear formulations of the utility. As long as it is written with TensorFlow operations, Choice-Learn and TensorFlow handle the optimization. For the more adventurers, you can even define your own operations as long as you provide the gradients.

Example 1: Rewriting the conditional MNL on ModeCanada

We download the ModeCanada dataset as a ChoiceDataset, see here for more details.

import numpy as np
import pandas as pd
from choice_learn.datasets import load_modecanada

dataset = load_modecanada(as_frame=False, preprocessing="tutorial", add_items_one_hot=False)

We will subclass the parent class ChoiceModel that we need to import. It mainly works with TensorFlow as a backend, it is thus recommended to use their operation as much as possible. Most NumPy operations have a TensorFlow equivalent. You can look at the documentation here.

For our custom model to work, we need to specify: - Weights initialization in init() - the utility function in compute_batch_utility()

import tensorflow as tf
from choice_learn.models.base_model import ChoiceModel

Utility formulation

Following the Conditional Logit tutorial we want to estimate the following utility function: $U(i, s) = \beta^{inter}_i + \beta^{price} \cdot price(i, s) + \beta^{freq} \cdot freq(i, s) + \beta^{ovt} \cdot ovt(i, s) + \beta^{income}_i \cdot income(s) + \beta^{ivt}_i \cdot ivt(i, t) + \epsilon(i, t)$ You can check the cLogit example for more details

Coefficients Initialization

Following our utility formula we need four coefficients vectors: - $\beta^{inter}$ has 3 values - $\beta^{price}$, $\beta^{freq}$, $\beta^{ovt}$ are regrouped and each has one value, shared by all items - $\beta^{income}$ has 3 values - $\beta^{ivt}$ has 4 values

Utility Computation

In the method compute_utility, we need to define how to estimate each item utility for each choice using the features and initialized weights. The arguments of the function are a batch of each features type of the ChoiceDataset class:

Order	Argument	shape	Features for ModeCanada
2	shared_features_by_choice	(batch_size, n_shared_features)	Customer Income
3	items_features_by_choice	(batch_size, n_items, n_tems_features)	Cost, Freq, Ivt, Ovt values of each mode
4	available_items_by_choice	(batch_size, n_items)	Not Used
5	choices	(batch_size, )	Not Used

batch_size represents the number of choices given in the batch. The method needs to return the utilities, in the form of a matrix of shape (n_choices, n_items), reprenting the utility of each item for each choice.

# You can verify the names and order of the features:
print(dataset.shared_features_by_choice_names, dataset.items_features_by_choice_names)

class CustomCanadaConditionalLogit(ChoiceModel):
    """Conditional Logit following for ModeCanada.

    Arguments:
    ----------
    optimizer : str
        tf.keras.optimizer to use for training, default is Adam
    lr: float
        learning rate for optimizer, default is 1e-3
    """

    def __init__(
        self,
        add_exit_choice=False, # Whether to add exit choice with utility=1
        optimizer="lbfgs", # Optimizer to use
        tolerance=1e-8, # Absolute function tolerance for optimality if lbfgs is used
        lr=0.001, # learning rate if stochastic gradient descent optimizer
        epochs=1000, # maximum number of epochs
        batch_size=32, # batch size if stochastic gradient descent optimizer
    ):
        """Model coefficients instantiation."""
        super().__init__(add_exit_choice=add_exit_choice,
                         optimizer=optimizer,
                         tolerance=tolerance,
                         lr=lr,
                         epochs=epochs,
                         batch_size=batch_size)

        # Create model weights. Basically is one weight by feature + one for intercept
        self.beta_inter = tf.Variable(tf.random_normal_initializer(0.0, 0.02, seed=42)(shape=(1, 3)),
                                 name="beta_inter")
        self.beta_freq_cost_ovt = tf.Variable(
            tf.random_normal_initializer(0.0, 0.02, seed=42)(shape=(1, 3)),
            name="beta_freq_cost_ovt"
            )
        self.beta_income = tf.Variable(tf.random_normal_initializer(0.0, 0.02, seed=42)(shape=(1, 3)),
                             name="beta_income")
        self.beta_ivt = tf.Variable(tf.random_normal_initializer(0.0, 0.02, seed=42)(shape=(1, 4)),
                               name="beta_ivt")

    # Do not forget to add them to the list of trainable_weights, it is mandatory !
    @property
    def trainable_weights(self):
        """Do not forget to add the weights to the list of trainable_weights.

        It is needed to use the @property definition as here.

        Return:
        -------
        list:
            list of tf.Variable to be optimized
        """
        return [self.beta_inter, self.beta_freq_cost_ovt, self.beta_income, self.beta_ivt]


    def compute_batch_utility(self,
                              shared_features_by_choice,
                              items_features_by_choice,
                              available_items_by_choice,
                              choices):
        """Method that defines how the model computes the utility of a product.

        Parameters
        ----------
        shared_features_by_choice : tuple of np.ndarray (choices_features)
            a batch of shared features
            Shape must be (n_choices, n_shared_features)
        items_features_by_choice : tuple of np.ndarray (choices_items_features)
            a batch of items features
            Shape must be (n_choices, n_items_features)
        available_items_by_choice : np.ndarray
            A batch of items availabilities
            Shape must be (n_choices, n_items)
        choices_batch : np.ndarray
            Choices
            Shape must be (n_choices, )

        Returns:
        --------
        np.ndarray
            Utility of each product for each choice.
            Shape must be (n_choices, n_items)
        """
        _ = (available_items_by_choice, choices)  # Avoid unused variable warning

        # Adding the 0 value intercept of first item to get the right shape
        full_beta_inter = tf.concat([tf.constant([[.0]]), self.beta_inter], axis=-1)
        # Concatenation to reach right shape for dot product
        full_beta_income = tf.concat([tf.constant([[.0]]), self.beta_income], axis=-1)  # shape = (1, n_items)

        items_ivt_by_choice = items_features_by_choice[:, :, 3] # shape = (n_choices, n_items, )
        items_cost_freq_ovt_by_choice = items_features_by_choice[:, :, :3 ]# shape = (n_choices, n_items, 3)
        u_cost_freq_ovt = tf.squeeze(tf.tensordot(items_cost_freq_ovt_by_choice,
                                                  tf.transpose(self.beta_freq_cost_ovt), axes=1)) # shape = (n_choices, n_items)
        u_ivt = tf.multiply(items_ivt_by_choice, self.beta_ivt) # shape = (n_choices, n_items)

        u_income = tf.tensordot(shared_features_by_choice, full_beta_income, axes=1)  # shape = (n_choices, n_items)

        # Reshaping the intercept that is constant over all choices (n_items, ) -> (n_choices, n_items)
        u_intercept = tf.concat([full_beta_inter] * (u_income.shape[0]), axis=0)
        return u_intercept + u_cost_freq_ovt + u_income + u_ivt

dataset.items_features_by_choice[0].shape

model = CustomCanadaConditionalLogit()
history = model.fit(dataset)

Decomposition of the utility operations

Intercept

$U_{inter}[air, s] = \beta^{inter}_{air} = 0$
$U_{inter}[bus, s] = \beta^{inter}_{bus}$
$U_{inter}[car, s] = \beta^{inter}_{car}$
$U_{inter}[train, s] = \beta^{inter}_{train}$

$\beta^{inter} = \left( $\begin{array}{c} 0 \\ \beta^{inter}_{bus} \\ \beta^{inter}_{car} \\ \beta^{inter}_{train} \\ \end{array}$ \right)$

$U_{inter} = \beta^{inter.T}$

Price, Freq, OVT

$U_{price, freq, ovt}[air, s] = \beta^{price} \cdot price[air, s] + \beta^{freq} \cdot freq[air, s] + \beta^{ovt} \cdot ovt[air, s]$
$U_{price, freq, ovt}[bus, s] = \beta^{price} \cdot price[bus, s] + \beta^{freq} \cdot freq[bus, s] + \beta^{ovt} \cdot ovt[bus, s]$
$U_{price, freq, ovt}[car, s] = \beta^{price} \cdot price[car, s) + \beta^{freq} \cdot freq[car, s] + \beta^{ovt} \cdot ovt(car, s]$
$U_{price, freq, ovt}[train, s] = \beta^{price} \cdot price[train, s] + \beta^{freq} \cdot freq[train, s] + \beta^{ovt} \cdot ovt[train, s]$

$\beta^{price, freq, ovt} = \left( $\begin{array}{c} \beta^{price} \\ \beta^{freq} \\ \beta^{ovt} \\ \end{array}$ \right)$ and $items_feature_by_choice[0, :3] = \left( $\begin{array}{ccc} price[air, 0] & freq[air, 0] & ovt[air, 0] \\ price[bus, 0] & freq[bus, 0] & ovt[bus, 0] \\ price[car, 0] & freq[car, 0] & ovt[car, 0] \\ price[train, 0] & freq[train, 0] & ovt[train, 0] \\ \end{array}$ \right)$

$U_{price, freq, ovt} = \beta^{price, freq, ovt .T} \cdot items_feature_by_choice[:, :3]$

Note that in the matrix we didn't illustrate the choices dimension, explaining the [0, :3] -> [:, :3]. items_features_by_choice[:, :3] has a shape of (batch_size, 4, 3) and $ \beta^{price, freq, ovt}$ a shape of (1, 3). Resulting $U_{price, freq, ovt} $ has therefore a shape of (batch_size, 4)

IVT

$U_{ivt}[air, s] = \beta^{ivt}_{air} \cdot ivt[air, s]$
$U_{ivt}[bus, s] = \beta^{ivt}_{bus} \cdot ivt[bus, s]$
$U_{ivt}[car, s] = \beta^{ivt}_{car} \cdot ivt[car, s]$
$U_{ivt}[train, s] = \beta^{ivt}_{train} \cdot ivt[train, s]$

$\beta^{ivt} = \left( $\begin{array}{c} \beta^{ivt}_{air} \\ \beta^{ivt}_{bus} \\ \beta^{ivt}_{car}\\ \beta^{ivt}_{train} \\ \end{array}$ \right)$\ and\ $items_features_by_choice[:, 3] = \left( $\begin{array}{cccc} ivt[0, air] & ivt[0, bus] & ivt[0, car] & ivt[0,train] \\ ivt[1, air] & ivt[1, bus] & ivt[1, car] & ivt[1,train] \\ ... & ... & ... & ... \\ ivt[batch\_size, air] & ivt[batch\_size, bus] & ivt[batch\_size, car] & ivt[batch\_size,train] \\ \end{array}$ \right)$

$U_{ivt} = \beta^{ivt} * items_features_by_choice[:, 3]$ of shape (batch_size, 4)

Income

$U_{income}[air, s] = \beta^{income}_{air} \cdot income[s]$
$U_{income}[bus, s] = \beta^{income}_{bus} \cdot income[s]$
$U_{income}[car, s] = \beta^{income}_{car} \cdot income[s]$
$U_{income}[train, s] = \beta^{income}_{train} \cdot income[s]$

$\beta^{income} = \left( $\begin{array}{c} \beta^{income}_{air} \\ \beta^{income}_{bus} \\ \beta^{income}_{car}\\ \beta^{income}_{train} \\ \end{array}$ \right)$ and $shared_features = \left( $\begin{array}{c} income[0] \\ income[1] \\ ... \\ income[batch\_size] \\ \end{array}$ \right)$

$U_{income} = \beta^{income .T} \cdot shared_features$

By concatenating batch_size times $U_{inter}$ over the choices we obtain 4 matrixes of shape (batch_size, 4).

The final utility is then: $U = U_{inter} + U_{price, freq, ovt} + U_{ivt} + U_{income}$

Results

We can now test that we o£btain the same results:

print(model.trainable_weights[0])
print(model.trainable_weights[1])
print(model.trainable_weights[2])
print(model.trainable_weights[3])

<tf.Variable 'beta_inter:0' shape=(1, 3) dtype=float32, numpy=array([[0.69834024, 1.8440617 , 3.2741678 ]], dtype=float32)>
<tf.Variable 'beta_freq_cost_ovt:0' shape=(1, 3) dtype=float32, numpy=array([[-0.03333923,  0.09252947, -0.04300353]], dtype=float32)>
<tf.Variable 'beta_income:0' shape=(1, 3) dtype=float32, numpy=array([[-0.08908718, -0.02799309, -0.03814669]], dtype=float32)>
<tf.Variable 'beta_ivt:0' shape=(1, 4) dtype=float32, numpy=
array([[ 0.05950976, -0.00678364, -0.00646018, -0.00145034]],
      dtype=float32)>

The coefficients are organized differently but reach the same values. It is also the case for negative log-lilkelihood:

print("Total Neg LikeliHood;", model.evaluate(dataset) * len(dataset))

Total Neg LikeliHood; tf.Tensor(1874.3633, shape=(), dtype=float32)

Example 2: Defining a non-linear utility function with TensorFlow

In this example we have used a simple linear function for utility computation. We could use any function we would like. Particularly we can use neural networks and activation functions to add non-linearities.

A simple example would be:

from tensorflow.keras.layers import Dense

class NeuralNetUtility(ChoiceModel):
    def __init__(self, n_neurons, **kwargs):
        super().__init__(**kwargs)
        self.n_neurons = n_neurons

        # Items Features Layer
        self.dense_items_features = Dense(units=n_neurons, activation="elu")

        # Shared Features Layer
        self.dense_shared_features = Dense(units=n_neurons, activation="elu")

        # Third layer: embeddings to utility (dense representation of features > U)
        self.final_layer = Dense(units=1, activation="linear")

    # We do not forget to specify self.trainable_weights with all coefficients that need to be estimated.
    # Small trick using @property to acces future weights of layers
    # that have not been instantiated yet !
    @property
    def trainable_weights(self):
        """Endpoint to acces model's trainable_weights.

        Returns:
        --------
        list
            list of trainable_weights
        """
        return self.dense_items_features.trainable_variables\
              + self.dense_shared_features.trainable_variables\
                  + self.final_layer.trainable_variables

    def compute_batch_utility(self,
                              shared_features_by_choice,
                              items_features_by_choice,
                              available_items_by_choice,
                              choices):
        """Computes batch utility from features."""
        _, _ = available_items_by_choice, choices
        # We apply the neural network to all items_features_by_choice for all the items
        # We then concatenate the utilities of each item of shape (n_choices, 1) into a single one of shape (n_choices, n_items)
        shared_features_embeddings = self.dense_shared_features(shared_features_by_choice)

        items_features_embeddings = []
        for i in range(items_features_by_choice[0].shape[1]):
            # Utility is Dense(embeddings sum)
            item_embedding = shared_features_embeddings + self.dense_items_features(items_features_by_choice[:, i])
            items_features_embeddings.append(self.final_layer(item_embedding))

        # Concatenation to get right shape (n_choices, n_items, )
        item_utility_by_choice = tf.concat(items_features_embeddings, axis=1)

        return item_utility_by_choice

model = NeuralNetUtility(n_neurons=10, optimizer="Adam", epochs=200)
history = model.fit(dataset)

model.evaluate(dataset) * len(dataset)

If you want more complex examples, you can look at the following implementations: - RUMnet