TasteNet

The TasteNet model, developped in [1] is available in Choice-Learn. Here is a small example on how it can be used.\ Following the paper, we will use it on the SwissMetro [2] dataset.

Summary

Data Loading
Model Parametrization
Model Estimation
Estimated Tastes Analysis
References

import os

os.environ["CUDA_VISIBLE_DEVICES"] = ""
import sys
sys.path.append("../../")

import numpy as np
import pandas as pd

from choice_learn.datasets import load_swissmetro
from choice_learn.models.tastenet import TasteNet

Data Loading

# The preprocessing="tastenet" let us format the data just like in the paper
customers_id, dataset = load_swissmetro(preprocessing="tastenet", as_frame=False)

We retrieved the SwissMetro dataset in the right format, let' look at it:

print("Items Features:", dataset.items_features_by_choice_names)
print("Shared Features:", dataset.shared_features_by_choice_names)

Model Parametrization

The dataset items are order: "TRAIN", "SM" and "CAR". We can now set up TasteNet model' hyperparameters. - taste_net_layers: list of neurons number for each layer in the taste neural network - taste_net_activation: activation function to be used within the taste neural network - items_features_by_choice_parametrization: parametrization of the estimated coefficients for the Items Features.

TasteNet uses the customer features (shared_features_by_choice) to estimate different coefficient to be mutliplied with alternative features (items_features_by_choice) to estimate the utility: $U(alternative) = \sum_{i \in alternative features} f(NN_i(customer features)) \cdot i$

With $f$ a normalizing function that can be used to set up some constraints such as positivity.

items_features_by_choice_parametrization describes the paramtrization of each alternative features and thus needs to have the same shape, (3, 7) in our case. The indexes also need to match. - if the parameter is a float the value is directly used to multiply the corresponding feature. - if the parameter is a string it indicates that which function $f$ to use meaning that we will use the taste neural network to estimate a parameter before using $f$.

taste_net_layers = []
taste_net_activation = "relu"
items_features_by_choice_parametrization = [[-1., "-exp", "-exp", 0., "linear", 0., 0.],
                            [-1., "-exp", "-exp", "linear", 0., "linear", 0.],
                            [-1., "-exp", 0., 0., 0., 0., 0.]]

In this example from the paper, the utilities defined by items_features_by_choice_parametrization are the following:

With $\mathcal{C}$ the customer features and $NN_k$ the output of the taste embedding neural network: $U(train) = -1 \cdot train_{CO} - e^{-NN_1(\mathcal{C})} \cdot train_{TT} - e^{-NN_2(\mathcal{C})} \cdot train_{HE} + NN_3(\mathcal{C}) \cdot ASC_{train}$ $U(sm) = -1 \cdot sm_{CO} - e^{-NN_4(\mathcal{C})} \cdot sm_{TT} - e^{-NN_5(\mathcal{C})} \cdot sm_{HE} + NN_6(\mathcal{C}) \cdot sm_{SEATS} + NN_7(\mathcal{C}) \cdot ASC_{sm}$ $U(car) = -1 \cdot car_{CO} - e^{-NN_8(\mathcal{C})} \cdot car_{TT}$

In order to evaluate the model we work with a Cross-Validation scheme. We need to pay attention that the split take into account the fact that the same person has answered several times and appears several time in the dataset. We work with a GroupOut strategy meaning that one person has all his answers in the same testing fold.

Model Estimation

from sklearn.model_selection import GroupKFold

folds_history = []
folds_test_nll = []
gkf = GroupKFold(n_splits=5)
# specift customer_id to regroup each customer answer
for train, test in gkf.split(list(range(len(dataset))), list(range(len(dataset))), customers_id): 
    tastenet = TasteNet(taste_net_layers=taste_net_layers,
                    taste_net_activation=taste_net_activation,
                    items_features_by_choice_parametrization=items_features_by_choice_parametrization,
                    optimizer="Adam",
                    epochs=40,
                    lr=0.001,
                    batch_size=32)
    train_dataset, test_dataset = dataset[train], dataset[test]
    hist = tastenet.fit(train_dataset, val_dataset=test_dataset)
    folds_history.append(hist)
    folds_test_nll.append(tastenet.evaluate(test_dataset))

We need to pay attention to overfitting, here is a plot to understand each fold train/test over the fitting epochs:

import matplotlib.pyplot as plt
for hist, color in zip(folds_history,
                       ["darkblue", "slateblue", "mediumpurple", "violet", "hotpink"]):
    plt.plot(hist["train_loss"], c=color)
    plt.plot(hist["test_loss"], c=color, linestyle="dotted")
plt.legend()
plt.show()

print("Average NegativeLogLikelihood on testing set:", np.mean(folds_test_nll))

Estimated Tastes Analysis

In order to analyze the model, one can look at the average output of the taste network. It is possible to reach the taste network with tastenet.taste_params_module or to call tastenet.predict_tastes.

for (item_index, feature_index), nn_output_index in tastenet.items_features_to_weight_index.items():
    print("Alternative:", ["train", "sm", "car"][item_index])
    print("Feature:", dataset.items_features_by_choice_names[0][feature_index])
    print("Average value over dataset:")
    act = tastenet.get_activation_function(items_features_by_choice_parametrization[item_index][feature_index])
    print(np.mean(act(tastenet.predict_tastes(dataset.shared_features_by_choice[0])[:, nn_output_index])))
    print("----------------------------\n")

References

[1][A Neural-embedded Discrete Choice Model: Learning Taste Representation with Strengthened Interpretability](https://arxiv.org/abs/2002.00922), Han, Y.; Calara Oereuran F.; Ben-Akiva, M.; Zegras, C. (2020)\ [2][The Acceptance of Model Innovation: The Case of Swissmetro](https://www.researchgate.net/publication/37456549_The_acceptance_of_modal_innovation_The_case_of_Swissmetro), Bierlaire, M.; Axhausen, K., W.; Abay, G. (2001)\