Skip to content

Introduction data handling, the TripDataset

How to create a TripDataset

Open In Colab

We create here a synthetic dataset to demonstrate how to use the Trip and TripDataset classes.

# Install necessary requirements

# If you run this notebook on Google Colab, or in standalone mode, you need to install the required packages.
# Uncomment the following lines:

# !pip install choice-learn

# If you run the notebook within the GitHub repository, you need to run the following lines, that can skipped otherwise:
import sys

sys.path.append("../../")
import os
# Remove/Add GPU use
os.environ["CUDA_VISIBLE_DEVICES"] = ""
import numpy as np

from choice_learn.basket_models import Trip, TripDataset

Dataset

Let's consider a simple dataset where we have only six items sold in two different stores: - The first store sells items [0, 1, 2, 3, 4] and has observed baskets [1, 0], [2, 0], [1, 3, 4, 0]; - The second store sells items [0, 1, 5, 6] and has observed baskets [1, 0], [6, 5, 0];

with 0 the checkout item.

n_items = 7

purchases_stores_1 =[[1, 0], [2, 0], [1, 3, 4, 0]]
purchases_stores_2 = [[1, 0], [6, 5, 0]]

assortment_store_1 = np.array([1, 1, 1, 1, 1, 0, 0])
assortment_store_2 = np.array([1, 1, 0, 0, 0, 1, 1])
available_items = np.array([assortment_store_1, assortment_store_2])
print(f"The list of available items are encoded as availability matrices indicating the availability (1) or not (0) of the products:\n{available_items=}\n")
print(
    "Here, the variable 'available_items' can be read as:\n",
    f"- Assortment 1 = {[i for i in range(n_items) if assortment_store_1[i]==1]}\n",
    f"- Assortment 2 = {[i for i in range(n_items) if assortment_store_2[i]==1]}"
)

Let's say that each basket has been seen 100 times. We can create Trip objects based on these shopping baskets and assortments with fixed prices.

# Create a list of Trip objects:
num_baskets = 100
trips_list = []

for _ in range(num_baskets):
    trips_list += [
        Trip(
            purchases=purchases_stores_1[0],
            # Let's consider here totally random prices for the products
            prices=np.random.uniform(1, 10, n_items),
            assortment=0
        ),
        Trip(
            purchases=purchases_stores_1[1],
            prices=np.random.uniform(1, 10, n_items),
            assortment=0
        ),
        Trip(
            purchases=purchases_stores_1[2],
            prices=np.random.uniform(1, 10, n_items),
            assortment=0
        ),
        Trip(
            purchases=purchases_stores_2[0],
            prices=np.random.uniform(1, 10, n_items),
            assortment=1
        ),
        Trip(
            purchases=purchases_stores_2[1],
            prices=np.random.uniform(1, 10, n_items),
            assortment=1
        )
    ]

Now that we have our Trip objects, we can instantiate a TripDataset that can be fed to a basket model.

data = TripDataset(trips=trips_list, available_items=available_items)
print(data)
print(f"\nThe TripDataset 'data' contains {data.n_items} distinct items that appear in {data.n_samples} transactions carried out at {data.n_stores} point(s) of sale with {data.n_assortments} different assortments.")
print(f"\nDescription of the first trip of the dataset:\n{data.get_trip(0)}")

This code is reused in the synthetic_dataset.py file to be called in other notebooks.