Introduction data handling, the TripDataset
How to create a TripDataset
We create here a synthetic dataset to demonstrate how to use the Trip and TripDataset classes.
# Install necessary requirements
# If you run this notebook on Google Colab, or in standalone mode, you need to install the required packages.
# Uncomment the following lines:
# !pip install choice-learn
# If you run the notebook within the GitHub repository, you need to run the following lines, that can skipped otherwise:
import sys
sys.path.append("../../")
Dataset
Let's consider a simple dataset where we have only six items sold in two different stores: - The first store sells items [0, 1, 2, 3, 4] and has observed baskets [1, 0], [2, 0], [1, 3, 4, 0]; - The second store sells items [0, 1, 5, 6] and has observed baskets [1, 0], [6, 5, 0];
with 0 the checkout item.
n_items = 7
purchases_stores_1 =[[1, 0], [2, 0], [1, 3, 4, 0]]
purchases_stores_2 = [[1, 0], [6, 5, 0]]
assortment_store_1 = np.array([1, 1, 1, 1, 1, 0, 0])
assortment_store_2 = np.array([1, 1, 0, 0, 0, 1, 1])
available_items = np.array([assortment_store_1, assortment_store_2])
print(f"The list of available items are encoded as availability matrices indicating the availability (1) or not (0) of the products:\n{available_items=}\n")
print(
"Here, the variable 'available_items' can be read as:\n",
f"- Assortment 1 = {[i for i in range(n_items) if assortment_store_1[i]==1]}\n",
f"- Assortment 2 = {[i for i in range(n_items) if assortment_store_2[i]==1]}"
)
Let's say that each basket has been seen 100 times. We can create Trip objects based on these shopping baskets and assortments with fixed prices.
# Create a list of Trip objects:
num_baskets = 100
trips_list = []
for _ in range(num_baskets):
trips_list += [
Trip(
purchases=purchases_stores_1[0],
# Let's consider here totally random prices for the products
prices=np.random.uniform(1, 10, n_items),
assortment=0
),
Trip(
purchases=purchases_stores_1[1],
prices=np.random.uniform(1, 10, n_items),
assortment=0
),
Trip(
purchases=purchases_stores_1[2],
prices=np.random.uniform(1, 10, n_items),
assortment=0
),
Trip(
purchases=purchases_stores_2[0],
prices=np.random.uniform(1, 10, n_items),
assortment=1
),
Trip(
purchases=purchases_stores_2[1],
prices=np.random.uniform(1, 10, n_items),
assortment=1
)
]
Now that we have our Trip objects, we can instantiate a TripDataset that can be fed to a basket model.
print(data)
print(f"\nThe TripDataset 'data' contains {data.n_items} distinct items that appear in {data.n_samples} transactions carried out at {data.n_stores} point(s) of sale with {data.n_assortments} different assortments.")
print(f"\nDescription of the first trip of the dataset:\n{data.get_trip(0)}")
This code is reused in the synthetic_dataset.py file to be called in other notebooks.