Product Recommendations in the Newsletter: Start with the Riskiest Assumption Test (RAT)- Keep Calm and Run ML Projects The Lean Way — Part 2

Stefan Vujović
6 min readJul 10, 2023

--

Welcome to the second article of the series “Keep Calm and Run Your ML Projects the Lean Way”. So far, we discussed the iterative approach of developing an ML system. The idea of each iteration is to make a descent down the Build-Measure-Learn Feedback Loop. The first iteration is focused on the RAT, followed by the MVP-oriented iteration after which we would reach the iteration where the aim is to SCALE the system.

In this installment, I will walk you through a real-life example of a RAT (Riskiest Assumption Test) in the context of recommendation systems. Specifically, we will focus on building a recommender system for products offered via the newsletter.

TL;DR

We make the first iteration through the Build-Measure-Learn Feedback Loop:

  • Case Study ( Personalizing the Weekly Newsletter) and Overview of the observed circumstances and Assumptions
  • Introduction of the Mercedes Decomposition framework to break down the problem
  • Utilizing a heuristic approach based on customer interests to automate product recommendation
  • System Architecture
  • Measurements and Learnings

Spoiler Alert

In this article, we won’t be covering "fancy" algorithms. The point of the RAT phase is to validate an idea using simple methods in order to pave the way for more complex solutions.

But don’t worry, we’ll explore the more advanced techniques later in this series.

Case Study

The case study takes place within an e-commerce platform catering to millions of craftsmen across Europe, offering a vast array of tools and equipment for their needs.

Every week, the CRM team was working together with the category management team to choose 4 products that were sent via newsletter to all the chosen subscribers. The aim was to motivate the customers to make a new purchase when they check their inbox.

A visual representation of the email content received by newsletter subscribers is shown below. Clicking on the product’s image or title would directly navigate them to the corresponding product detail page on the website, enabling them to make a purchase.

Observed circumstances:

The product selection is a manual process

All the customers receive the same products

The solutions are shaped by the given circumstances and their complexity may vary. Therefore a good starting point is to understand the problem and how it is currently being solved and what are the limitations of that solution. At such an early phase, we just need to demonstrate that there is a slightly better way to solve the problem.

Assumptions:

This can be automated and save time for the 2 teams

We can offer more personalized and more relevant products to each customer and boost the sales

What we need to figure out is:

  • how do we connect the relevant products to the individual customers, and what data we can use?
  • how do we push these products to the customer's inbox?

That brought us to the Mercedes Decomposition.

Mercedes Decomposition — RAT

Let's rehearse the definition of the RAT, just in case you missed it from the previous article:

RAT — Riskiest Assumption Test addresses the validation of the riskiest hypothesis, with the aim to test the assumption without deploying a system in production.

The figure below contains a set of hypotheses that we have made in this phase. The green sticker notes are used to highlight a hypothesis and red sticker notes are used for relevant questions.

As discussed in the first article, we add sticky notes along the 3 axes, but at this point, we are focusing on the first phase (smallest circle) — the RAT phase.

Method — How do we do it?

We will send the top-sellers from the customer's category of interest.

Data — how do we compute this?

We have a product category tree — each product is a part of a category of products.

We have the customers purchase history

User story — how do the products reach the inbox?

We need to associate the customer id to the product ids we are offering and pass them to the CRM tool that takes care of sending the email.

The Heuristic

As the first rule of Google's Rules of Machine Learning suggests: we started without machine learning.

Using the historical sales data, we differentiated products by their sales volume — separating them into top-seller, medium, and long-tail products.

Moreover, we replicated this over different product categories such as Power Tools, Air Compression Technology, Workwear, etc.

If a customer's last purchase was a product from the i.e. Power Tools category, we would fetch 4 top-sellers from this category and recommend them to the customer. This was based on the assumption that the customer would be interested in products from this group since they already made a purchase within the same category.

Obviously, this approach had many flaws. For instance, it didn't consider that a customer might already have made all the needed purchases within the category we were addressing and would prefer to see something from a different category in their inbox.

However, despite the inherent risk, we were determined to put our naive assumptions to the test. The fact that we were comparing it against a non-personalized approach was giving us a lot of hope.

System Architecture

The terms "system" and "architecture" might sound like an overstatement, since there is no deployed system, no database, no scheduled tasks, etc. This stage is where the inherent improvisational (hacky) nature of the RAT phase becomes apparent.

However, here is a visual representation of the very simple workflow we needed in order to run an experiment. All the computation that we visualized in the figure above, happens in the Jupyter Notebook, and yes, it is a notebook that is executed manually (reminder: we are not building a production-ready system in this phase).

The output of the notebook is a simple CSV file with two columns customer_id and product_id. It is all that the CRM tool needed in order to send the products via email to the customers.

The Measure & Learn Part

After conducting several weeks of experimentation, including a statistical test to compare both groups, we observed a substantial uplift in the click-through rate (CTR) and purchases. The results clearly demonstrated the effectiveness of our simple automated solution compared to the manual product selection process.

Moreover, the experiment highlighted the time-saving benefits of automation for the team responsible for weekly product selection.

While the positive impact on business metrics was promising, we also valued the qualitative feedback received from the newsletter subscribers. Their comments and suggestions provided valuable insights that guided us in refining the current solution and moving towards crafting a Minimum Viable Product (MVP).

Conclusion

This concludes our first iteration through the Build-Measure-Learn Feedback Loop, focusing on the Riskiest Assumption Test (RAT) phase. By utilizing the Mercedes Decomposition framework, we successfully implemented a “primitive” personalized recommender system for newsletter subscribers.

The automation of product selection and improved relevance in the inbox resulted in positive impacts on key business metrics, such as increased click-through rates and purchases. Through this process, we not only validated our assumptions but also gained valuable insights that will shape the next phase of our project.

Building upon the learnings and outcomes from the RAT phase, our next article will delve into the concept of crafting a Minimum Viable Product (MVP). We will explore how the MVP enables us to capitalize on these insights and drive further advancements in our journey of running ML projects the lean way.

🔽 Next Phase: Crafting The MVP

Hit the subscribe to get notified about upcoming articles! ❤️

--

--