Implementing a Recommendation System at Kooth using LightFM

Jamie Davis
Building Kooth
Published in
6 min readMay 9, 2024

In this article we explore the process of building a recommendation system in the context of digital mental health.

Introduction

Recommender systems are a class of algorithms that aim to predict which content or information is relevant to an individual; they present a means to personalize a user’s experience and (hopefully) increase engagement with the service being provided. We developed a recommendation system for our mental health app, Soluna, to help users find relevant content that will help them to develop psychological flexibility.

Goal

Beyond coaching, a key part of the Soluna experience is engaging with the content which helps users to build psychological flexibility in a self-guided manner. Delivering this content in a way that meets individuals’ needs for autonomy, relatedness and competence whilst also encouraging novel exploration is an important part of ensuring sustained engagement. With this in mind, the goal of this project was to:

  • Design, develop and deploy a recommendation system that delivers personalized recommendations to the user based on their preferences and patterns of use.
  • Ensure that users receive holistic experience of content by using Kooth’s Theory of Change to guide our recommendations.

Why LightFM?

Once the aims of this project were clear, we needed to select a model to generate content recommendations. Following some initial exploration and discussion we decided to go with LightFM for a number of reasons:

The cold-start problem

Our app was very new at the time of developing this recommender, so most users had only interacted with a handful of content pieces. This meant we had very little implicit feedback (e.g. clicks, views, etc.) on which to train a traditional collaborative filtering approach which relies on implicit feedback to learn user preferences.

LightFM is a Python package that uses collaborative filtering to learn the embeddings but the final representations of users and items are created from the sum of the embeddings that map to the user/item features i.e. their content/metadata. In effect, this library enables us to inform our recommendations with user/content metadata while the service is new and gradually shift to an interaction-based recommendation approach as usage and engagement grow.

Extendability

The need to ensure our recommendations align with our Theory of Change is a fairly domain-specific task that would’ve required us to apply additional logic to whichever library we selected. Given that the rationale behind LightFM is simply a metadata-enriched collaborative filtering algorithm it is fairly straightforward to build upon and enabled us to successfully incorporate the Theory of Change into our recommendation system.

The Data

Implicit Feedback

The first thing we needed to do was identify data points that, when the data became less sparse, could be used to model user preferences. We can gain a good understanding of what content users like by examining what they have or haven’t interacted with, often more so than if we use explicit feedback (e.g. ratings) alone. For example, let’s say I look at five content pieces about OCD. I rate one highly, one poorly and don’t review three of them. The key thing here is that I am clearly interested in content about OCD, not that I saw one good and one bad content piece.

We selected the completion of a piece of content as the implicit feedback signal which we would feed to our model. As with any assumption, equating content completion to a user’s preference for it may introduce bias to the system’s predictions because there are other factors that may influence completion (e.g. they have been advised to do the activity by a coach). Overall, we do not anticipate these factors influencing a majority of instances; therefore, we argue that content completion reflects a reasonable translation of the user preference into a measurable datapoint.

Content metadata

Secondly, within Soluna, we have developed a very granular and intentional system of tagging and organizing our content with stakeholders from across the organization. One of the reasons we chose LightFM was because its hybrid modality allows us to take advantage of this rich content metadata to inform recommendations while the service is new.

The content metadata we leverage includes, but is not limited to, characteristics such as content format, the general topics that piece covers, and the element(s) of our theory of change to which the content maps. Additionally, we leverage some user metadata in our recommenders too; namely, age, time with service, account type (registered vs guest) and latest PHQ4 test results.

The Model

LightFM “mini-recommenders”

An overview of the mini-recommenders leveraged by our recommendation system and their logic.

We use the LightFM as the basis for this recommender, but we have wrapped it with additional logic that helps guide what is recommended to users. Specifically, recommendations don’t come from one recommender, but rather a suite of mini-recommenders. Each mini-recommender uses LightFM to select the most relevant content from a specific subset of our library that has been identified through a rules-based approach informed by our Theory of Change.

Currently, the recommendations are surfaced to users in two parts of the UI: the “For You” section of the content library; and, the “Discover More” section shown after a content piece is completed.

The modular nature of the mini-recommenders is intended to make the system easily extensible and to give us flexibility around which recommenders are used where. This puts us in a great position if we decide that we need to serve recommendations from other parts of the UI in the future.

Monitoring

We discussed a variety of ways for measuring the performance of our recommendation system in production. But, in short, it boils down to two questions:

  • What do good/bad recommendations look like?
  • What do good/bad user experiences look like?

Recommendation Relevance

There are many aspects that could define a bad user experience of a recommender, such as consistently being recommended the same items; however, in some contexts this may be what you want. The most universal indication of a bad recommender is seeing recommendations that aren’t relevant to you. Therefore, to gauge the relevance of the recommendations our model was making we measured the precision: of the content pieces recommended to a user over a given period, what proportion of those recommendations does the user complete.

Recommendation quality

The quality of a recommender can be assessed in areas other than users’ ratings or their history. For example, coverage is a metric that we use to measure the proportion of the content library that the recommender has surfaced to users; not every piece has to be surfaced to every user but all of our content is created with our users in mind and a good recommender should be able to show the right content to the right user.

Additionally, we look at the rarity of recommendations — that is, the unusualness of the recommended content pieces relative to the most popular piece of content. A recommender that only ever returns already-popular content is less likely to benefit the user, whereas a recommender that can return more unusual content will help the user to discover content they wouldn’t otherwise have found.

Ethics

Given the sensitive nature of the mental health domain, and the real-world impact that personalization by means of recommendation can have on users, an ethical examination of how such a model is created and maintained is of particular importance. We designed, developed and deployed this system in line with the Alan Turing Institute’s Ethical AI framework to ensure this system was created and is implemented in the fairest possible way.

Future Improvements

Currently, the only source of implicit feedback our model accounts for is content completion; however, we know that there is information contained in other datapoints such as bookmarks and helpfulness ratings. Therefore future work on this project will involve expanding the range of data points we collect to give us a richer understanding of user experience.

Additionally, future work on this project will also look to extend the ways in which we gauge its success. This will include looking at other user- and recommender-centric metrics such as diversity (to elucidate the variety of content users are getting in their recommendations), and working to map recommender performance to business metrics (e.g. time spent in-app, conversion from guest to registered user type).

--

--

Jamie Davis
Building Kooth

Health data scientist currently working mental health