TasteNet
The TasteNet model, developped in [1] is available in Choice-Learn. Here is a small example on how it can be used.\ Following the paper, we will use it on the SwissMetro [2] dataset.
Summary
# Install necessary requirements
# If you run this notebook on Google Colab, or in standalone mode, you need to install the required packages.
# Uncomment the following lines:
# !pip install choice-learn
# If you run the notebook within the GitHub repository, you need to run the following lines, that can skipped otherwise:
import os
import sys
sys.path.append("../../")
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
import numpy as np
import pandas as pd
from choice_learn.datasets import load_swissmetro
from choice_learn.models.tastenet import TasteNet
Data Loading
# The preprocessing="tastenet" let us format the data just like in the paper
customers_id, dataset = load_swissmetro(preprocessing="tastenet", as_frame=False)
We retrieved the SwissMetro dataset in the right format, let' look at it:
print("Items Features:", dataset.items_features_by_choice_names)
print("Shared Features:", dataset.shared_features_by_choice_names)
Model Parametrization
The dataset items are order: "TRAIN", "SM" and "CAR". We can now set up TasteNet model' hyperparameters. - taste_net_layers: list of neurons number for each layer in the taste neural network - taste_net_activation: activation function to be used within the taste neural network - items_features_by_choice_parametrization: parametrization of the estimated coefficients for the Items Features.
TasteNet uses the customer features (shared_features_by_choice) to estimate different coefficient to be mutliplied with alternative features (items_features_by_choice) to estimate the utility:
With $f$ a normalizing function that can be used to set up some constraints such as positivity.
items_features_by_choice_parametrization describes the paramtrization of each alternative features and thus needs to have the same shape, (3, 7) in our case. The indexes also need to match. - if the parameter is a float the value is directly used to multiply the corresponding feature. - if the parameter is a string it indicates that which function $f$ to use meaning that we will use the taste neural network to estimate a parameter before using $f$.
taste_net_layers = []
taste_net_activation = "relu"
items_features_by_choice_parametrization = [[-1., "-exp", "-exp", 0., "linear", 0., 0.],
[-1., "-exp", "-exp", "linear", 0., "linear", 0.],
[-1., "-exp", 0., 0., 0., 0., 0.]]
In this example from the paper, the utilities defined by items_features_by_choice_parametrization are the following:
With $\mathcal{C}$ the customer features and $NN_k$ the output of the taste embedding neural network:
In order to evaluate the model we work with a Cross-Validation scheme. We need to pay attention that the split take into account the fact that the same person has answered several times and appears several time in the dataset. We work with a GroupOut strategy meaning that one person has all his answers in the same testing fold.
Model Estimation
from sklearn.model_selection import GroupKFold
folds_history = []
folds_test_nll = []
gkf = GroupKFold(n_splits=5)
# specift customer_id to regroup each customer answer
for train, test in gkf.split(list(range(len(dataset))), list(range(len(dataset))), customers_id):
tastenet = TasteNet(taste_net_layers=taste_net_layers,
taste_net_activation=taste_net_activation,
items_features_by_choice_parametrization=items_features_by_choice_parametrization,
optimizer="Adam",
epochs=40,
lr=0.001,
batch_size=32)
train_dataset, test_dataset = dataset[train], dataset[test]
hist = tastenet.fit(train_dataset, val_dataset=test_dataset)
folds_history.append(hist)
folds_test_nll.append(tastenet.evaluate(test_dataset))
We need to pay attention to overfitting, here is a plot to understand each fold train/test over the fitting epochs:
import matplotlib.pyplot as plt
for hist, color in zip(folds_history,
["darkblue", "slateblue", "mediumpurple", "violet", "hotpink"]):
plt.plot(hist["train_loss"], c=color)
plt.plot(hist["test_loss"], c=color, linestyle="dotted")
plt.legend()
plt.show()
Estimated Tastes Analysis
In order to analyze the model, one can look at the average output of the taste network. It is possible to reach the taste network with tastenet.taste_params_module or to call tastenet.predict_tastes.
for (item_index, feature_index), nn_output_index in tastenet.items_features_to_weight_index.items():
print("Alternative:", ["train", "sm", "car"][item_index])
print("Feature:", dataset.items_features_by_choice_names[0][feature_index])
print("Average value over dataset:")
act = tastenet.get_activation_function(items_features_by_choice_parametrization[item_index][feature_index])
print(np.mean(act(tastenet.predict_tastes(dataset.shared_features_by_choice[0])[:, nn_output_index])))
print("----------------------------\n")
References
[1][A Neural-embedded Discrete Choice Model: Learning Taste Representation with Strengthened Interpretability](https://arxiv.org/abs/2002.00922), Han, Y.; Calara Oereuran F.; Ben-Akiva, M.; Zegras, C. (2020)\ [2][The Acceptance of Model Innovation: The Case of Swissmetro](https://www.researchgate.net/publication/37456549_The_acceptance_of_modal_innovation_The_case_of_Swissmetro), Bierlaire, M.; Axhausen, K., W.; Abay, G. (2001)\