Building a Listwise Ranking TF recommender: A step-by-step guide

Roberto Félix Patrón, PhD
Just Eat Takeaway-tech

--

Introduction

Disclaimer: As part of our continuous improvement process, we are constantly experimenting and evaluating potential solutions to enhance consumer experience. However, it’s important to note that the approach discussed here does not entirely encompass the comprehensive range of recommendations offered to our customers at Just Eat Takeaway.com.

The Listwise recommenders powered by TensorFlow are a great tool for delivering personalized recommendations in a data-driven company such as Just Eat Takeaway.com.

Providing a list of recommended restaurants to our users can be more effective than focusing solely on recommending individual restaurants with high accuracy. Unlike pairwise recommenders, listwise models consider the entire recommendation list as a unit, optimizing it to capture complex user preferences.

For example, if there is a user that enjoyed Restaurant A (sushi) and Restaurant B (Mexican), a pairwise recommender may recommend similar restaurants to A and B, such as Restaurant C (sushi) and Restaurant D (Mexican). However, this approach does not consider the overall preferences of the user and may not capture the full context of their preferences.

On the other hand, a listwise recommenders takes into account the user’s preferences for multiple items and optimizes the recommendation list as a whole. It would consider the entire set of restaurants available and generate a recommendation list that optimizes for multiple objectives, such as maximizing the user’s satisfaction across cuisines. For example, the listwise recommender may suggest Restaurant E (Asian) and Restaurant F (Latin American), resulting in a more tailored and satisfying list.

Serendipity refers to the ability of recommenders to surprise users with unexpected and delightful recommendations they may not have discovered otherwise. By exploring diverse item combinations and introducing novel items, listwise recommenders promote serendipitous discoveries, which should enhance user engagement and satisfaction.

Listwise TensorFlow recommenders offer several advantages, particularly in large-scale recommendation tasks.

In this article we guide you through the process of setting up and training these models.

The data

The following libraries need to be installed and imported:

import fsspec
import numpy as np
import pandas as pd
import pandas_gbq
from google.cloud import bigquery
import gcsfs
import gc
import psutil
import array
import collections
import tensorflow as tf
import tensorflow_recommenders as tfrs
import tensorflow_ranking as tfr

The dataset we will be using contains a UserID, RestaurantID and a Rating for each user/restaurant pair as follows:

The data in our example is extracted from BigQuery, then transformed to tensors using the following code:

def qry_user_rest_ratings(project_id = 'example-project') :
query = """SELECT UserID, RestaurantID, Rating FROM `data-warehouse.recommenders.users_rest_rating`"""
client = bigquery.Client(project_id)
df = client.query(query)
return df.to_dataframe()

df = qry_user_rest_ratings()
df = df.dropna()

ratings_rank = df[["UserID", "RestaurantID", "Rating"]].copy()
restaurants = df[["RestaurantID"]].copy()

df = []

restaurants = tf.data.Dataset.from_tensor_slices(dict(restaurants))
restaurants = restaurants.map(lambda x: x["RestaurantID"])

ratings_rank = tf.data.Dataset.from_tensor_slices(dict(ratings_rank))
ratings_rank = ratings_rank.map(lambda x: {
"UserID": x["UserID"],
"RestaurantID": x["RestaurantID"],
"Rating": x["Rating"],
})

# Get the unique users and restaurants to create the vocabularies

unique_restaurants = np.unique(np.concatenate(list(restaurants.batch(1000))))
unique_user_ids = np.unique(np.concatenate(list(ratings_rank.batch(1_000).map(lambda x: x["RestaurantID"]))))

This data can’t be used directly for list optimization. As mentioned in the TensorFlow tutorial, the data needs to provide sampled lists for each user. Tensorflow provides an example code to generate these lists. I found, however, that creating this directly in BigQuery saves time. Here is an example of how this could be implemented:

def qry_user_rest_ratings_listed(project_id = 'example-project') :
query = """ WITH global_query AS(
WITH repeated_query AS (
SELECT UserID, ARRAY_AGG(STRUCT(RestaurantID, Rating) ORDER BY RAND()) AS restaurant_ratings_list
FROM (
SELECT UserID, RestaurantID, Rating,
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY RAND()) AS row_num
FROM `data-warehouse.recommenders.users_rest_rating`
)
WHERE row_num <= 5
GROUP BY UserID
HAVING COUNT(row_num) = 5
)
SELECT * FROM repeated_query
UNION ALL"""
client = bigquery.Client(project_id)
df = client.query(query)
return df.to_dataframe()

This query will get 5 random examples for each user and it will only keep users that have at least 5 ratings. It can be executed repeatedly to generate as many lists as required.

Next, this data is converted to tensors:

df = qry_data_for_training_listed()
buffer_size = len(df)

tensor_slices = {"UserID": [], "RestaurantID": [], "Rating": []}
tensor_slices["parent_key"] = df['UserID']
tensor_slices["restaurant_key"] = df['RestaurantID'].tolist()
tensor_slices["appreciation"] = df['Rating'].tolist()

df = []
train = tf.data.Dataset.from_tensor_slices(tensor_slices)
tensor_slices = []

In our case, we will use the data for the last six months as training.

As you may have noticed, it was important to us to clear variables from memory as we are dealing with large datasets. This tip may be helpful if you are implementing this recommender with limited resources.

The model

The model is structured very similarly to the TensorFlow tutorial:

class RankingModel(tfrs.Model):

def __init__(self, loss):
super().__init__()
embedding_dimension = 32

# User embeddings
self.user_embeddings = tf.keras.Sequential([
tf.keras.layers.StringLookup(
vocabulary=unique_user_ids),
tf.keras.layers.Embedding(len(unique_user_ids) + 2, embedding_dimension)
])

# Restaurand embeddings
self.restaurant_embeddings = tf.keras.Sequential([
tf.keras.layers.StringLookup(
vocabulary=unique_restaurants),
tf.keras.layers.Embedding(len(unique_restaurants) + 2, embedding_dimension)
])

# Compute predictions
self.score_model = tf.keras.Sequential([
# Learn multiple dense layers.
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(64, activation="relu"),
tf.keras.layers.Dense(1)
])

self.task = tfrs.tasks.Ranking(
loss=loss,
metrics=[
tfr.keras.metrics.NDCGMetric(name="ndcg_metric"),
tf.keras.metrics.RootMeanSquaredError()
]
)

def call(self, features):
user_embeddings = self.user_embeddings(features["UserID"])
restaurant_embeddings = self.restaurant_embeddings(features["RestaurantID"])

# The embeddings are concatenated
list_length = features["RestaurantID"].shape[1]
user_embedding_repeated = tf.repeat(
tf.expand_dims(user_embeddings, 1), [list_length], axis=1)
concatenated_embeddings = tf.concat(
[user_embedding_repeated, restaurant_embeddings], 2)

return self.score_model(concatenated_embeddings)

def compute_loss(self, features, training=False):
labels = features.pop("Rating")
scores = self(features)
return self.task(
labels=labels,
predictions=tf.squeeze(scores, axis=-1),
)

Some key things to notice in the previous code are the metrics. Together with the Root Mean Squared Error, the NDCG (normalized discounted cumulative gain) measures the quality of a ranked list of items in a recommendation system. It considers both the relevance and position of each item, providing a comprehensive evaluation.

We create a shuffled, batched, and cached training dataset, which can be used for training:

cached_train = train.shuffle(buffer_size).batch(8192).cache()

The model is created, compiled, and fitted:

listwise_model = RankingModel(tfr.keras.losses.ListMLELoss())
listwise_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
listwise_model.fit(cached_train, epochs=5, verbose=True)

The ListMLE loss function maximizes the likelihood of the correct ranking by penalizing deviations from the observed ranking, improving the quality of generated ranked lists.

The predictions

To be useful, our recommendations must only include restaurants that are close enough to the user and open for delivery at the time of the search.

We create a new dataset that contains the available restaurants, and from these we create a recommendations list.

Users can have different numbers of restaurants available, but TensorFlow doesn’t allow tensors of different sizes. To work around that problem, the users are grouped by the count of restaurants available (cnt_res_per_cust), then the predictions can be made at once for all users within each group.

The following function calculates the scores for each user/restaurant combination.

def pred_grouped(model, temp):

predictions_list = []
customer_list = []
restaurant_list = []

temp['score'] = pd.NA

for group in temp["cnt_res_per_cust"].unique():
idx = np.where(temp["cnt_res_per_cust"] == group)[0]
cust_list = temp['UserID'][idx].values
res_list = np.array(temp['RestaurantID'][idx].tolist())

temp['score'][idx] = (listwise_model({'UserID': cust_list,
'RestaurantID': res_list}).numpy()).reshape(len(res_list),group).tolist()

return temp

Depending on the country, the amount of combinations can exceed billions of rows! Storing all these in memory can be a challenge, so first we split the data into different files and save the splits in GCS.

The following code gets the data files from GCS, creates scores for each file, and publishes this to a BigQuery table.

from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('your-bucket-name')
folder_prefix = 'recommender/data_for_predictions/'
blobs = bucket.list_blobs(prefix=folder_prefix)

for blob in blobs:
if blob.name.endswith('.gzip'):
temp_df = pd.read_parquet(f"gs://your-bucket-name/{blob.name}")
rows = pred_grouped(listwise_model, temp_df)
rows = pd.DataFrame({'UserID': np.repeat(rows['UserID'],
rows['RestaurantID'].str.len()),
'RestaurantID': np.concatenate(rows['RestaurantID'].values),
'score': np.concatenate(rows['score'].values)
})

# Reset the index of the expanded DataFrame
rows.reset_index(drop=True, inplace=True)

pandas_gbq.to_gbq(rows, 'data-warehouse.recommenders.tf_ranking_scores_listwise',
project_id='example-project',
if_exists='append',
verbose=False)


rows = []

And that’s it! This step-by-step guide allows you to see the entire process to get your data in the right format for list-sampling and use this to train your model and make predictions for a list of available users.

Conclusions

The results of this listwise recommender system may not show a significant difference metrics such as daily conversion (percentage of users who clicked and order from a restaurant). However, this doesn’t mean that the performance is the same.

Evaluating serendipity in recommendation systems can be challenging. A user may initially refuse or overlook a novel and unexpected recommendation but eventually change their mind over time. This may happen due to several reasons:

  • Initial resistance to change
  • Gradual acceptance and curiosity
  • Gaining trust in the recommender system.
  • Evolving preferences and interests.
  • Experiences of unexpected recommendations turning out to be enjoyable or valuable.

Unexpected but delightful recommendations may lead to long-term user engagement and improved satisfaction, which it may be by itself a good reason to try out listwise recommender systems!

Just Eat Takeaway.com is hiring! Want to come work with us? Apply today.

--

--