Modern RecSys

How to Design a Recommender?

For this chapter, I will introduce the RecSys Design Framework with a case study of Amazon.

Published in

Analytics Vidhya

7 min readMar 4, 2020

This is part of my Modern Visual RecSys series; feel free to check out the rest of the series at the end of the article.

RecSys Framework — Amazon case study

An eCommerce website like Amazon is heavily reliant on having a good RecSys. After all, users cannot be expected to browse through millions of products that are on the platform, while sellers will like exposure for their products. Furthermore, there is limited space on the website/app; a good RecSys should be able to match the user’s preferences with the most relevant product and display the results in a certain order that encourages the user to click or purchase the item.

How do we go about designing a RecSys from scratch? Let me share with you a framework I use.

Step 1: Define The business case

There is no clear “correct answer” for RecSys. From the Amazon recommendations above, how do you tell if the RecSys is doing a great job? If I bought socks and monitors in the past, does it still make sense to recommend me more socks and monitors? RecSys is all about improving user experience and business KPIs. It is thus vital to understand the metrics involve:

RecSys metrics:

“How well it matches historical trends”: accuracy / relevancy / coverage
“How diverse are the recommendations”: diversity / serendipity / novelty
Respecting user Privacy
Cold start problem (the challenge of recommending to new users)

Business metrics:

Clickthrough rate (CTR)
Purchase rate
Introduce new product / seller

While accuracy/relevancy gives as a ballpark figure to understand how well our RecSys matches up with historical trends, they do not show the full picture.

Users cannot judge items they have not yet seen. In essence, a RecSys can perform poorly on historical measures while introducing lots of new products to users and increase the overall purchase rate. It is crucial to have a mix of historical, diversity, and business metrics measured through AB testing with different demographics of users.

Step 2: Prepare the Data

Let us take a look at the patent filed by Amazon in 1998 that contains the foundation of RecSys, relevant even till today.

Source: Patent US 6,266,649 B1: Collaborative recommendations using item-to-item similarity mappings by Amazon

From the patent image above, we can outline the various pieces of data that are found in all modern RecSys:

User interactions (shown as web server — html in image). These are usually clickstream data that are stored as log files and accessed via a publish-subscribe messaging system like Kafka. A common technique to track users is browser fingerprinting, which comes with issues of user privacy.
User profiles (with purchase history, ratings, shopping cart, wishlist, comments, recently browse, etc.). These are usually transactional and structured data stored in most customer databases. A baseline data for persona creation, segmentation, targeting, retention and reactivation. A lot of work is in user fraud as well.
Seller profiles (ratings, sales, price range, promotions, partnerships, novelty, catalog, etc.). These are missing in the image above but are essential in any modern RecSys design. Sellers are the platform’s lifeblood and content partner. The data will be critical in product promotions, launching new products, and ranking/placement of products on the site.
Item attributes (price history, category, sales, quantity in stock, consolidate across multiple-sellers, images, fake product check, etc.). These data are vital to the item similarity models. Product image similarity will be a core part of this series of articles.

Use data in combination — the power of the data and insights grow exponentially when you combine them. The image outline three key relationships underlying most RecSys: popular item + similar item, customer + item, market basket analysis.

Step 3: Design a suitable architecture:

Source: System Architectures for Personalization and Recommendation by *Xavier Amatriain* *and* *Justin Basilico*

The system architecture of RecSys is usually a guarded secret and differs based on the scale and requirements. One good baseline architecture is the one from Netflix (see the link under further readings for more details). Netflix hosts their RecSys on AWS and divides the system into three time horizons:

Offline (batch processing), these are models that are computationally heavy and takes a long time to run. Models that need to generate long term relationship pairs across all users and products, such as collaborative filtering models will fall under this category. Usually, we will run these models at scheduled intervals throughout the day, every few hours. These are often the most complex and accurate models.
Nearline (semi real-time / micro-batch processing). Modern frameworks like Spark allows for processing of data every 5–10s, and this is great for modeling short term behavioral patterns, such as browsing patterns. We can update the recommendations based on what the user is looking at and recommend complementary items for what they added to the cart.
Online (real-time) models can be costly and unnecessary for most use cases that can be handled in semi real-time. Because real-time processing occurs in the ballpark of ~10ms, little modeling can be completed at this speed. Most likely, we will be pre-generating the results (such as the most popular items) and serving results via an efficient data structure such as Redis (in-memory database) or cache. Real time testing in the form of AB testing or multi armed bandits are also critical for the success of any RecSys.

Designing a good RecSys architecture takes experience, time, and an understanding of stakeholder requirements (metrics, data, budget, time, etc.).
The key is to start small with offline models, scale towards semi real-time models, and always be testing in real-time.

What have we learned

It is not easy to design a RecSys. We should always start with the business problem. Do we have a current baseline solution? What are the expectations/goals? How much resources are we willing to put into this project? Next, we should evaluate our data sources. Are we tracking the user interactions? Do we have access to the user and item data? How much historical data do we have to work with? The answer to all these questions will determine the kinds of architecture we design to tackle the requirements.

Reflections

Imagine your RecSys team consists of business stakeholders, designers, front end & back end engineers, data engineers, and fellow data scientists. What questions will you ask the various team members?
Below, we have the patent filings from Amazon regarding how they generate shopping cart and instant recommendations. Do the workflows make sense? Are there any clarifications that you will like from the data scientists that designed the workflow? How will you improve the workflow with modern tools/processes?

Explore the rest of Modern Visual RecSys Series

Modern Visual RecSys: How does a recommender work? [Foundational]

In this series of articles, I will introduce modern approaches to visual recommender systems. We begin with a case…

medium.com

Modern Visual RecSys: Intro to Visual RecSys [Core]

We will explore the “hello world” data for visual models, the FashionMNIST dataset from Zalando with PyTorch…

medium.com

Modern Visual RecSys: Convolutional Neural Networks Recommender [Pro]

We will build a recommender by leveraging transfer learning with ResNet and return visually similar products across…

medium.com\

Modern Visual RecSys: COVID-19 Case Study with CNN [Pro]

We will cluster COVID-19 X-ray images based on severity with our CNN RecSys flow using transfer learning, Spotify’s…

medium.com

Building a Personalized Real-Time Fashion Collection Recommender [Pro]

We will make use of transfer learning, approximate nearest neighbors, and embeddings centroid detection in PyTorch to…

towardsdatascience.com

Temporal Fashion Recommender [Pro]

Building a Recommender That Evolves with Seasons

towardsdatascience.com

The Future of Visual Recommender Systems: Four Practical State-Of-The-Art Techniques [Pro]

The future of visual RecSys is an exciting one. Let us explore some of the most cutting edge techniques and ideas that…

medium.com

Series labels:

Foundational: general knowledge and theories, minimum coding experience needed.
Core: more challenging materials with code.
Pro: Difficult materials and code, with production grade tools.

Modern RecSys

How to Design a Recommender?

For this chapter, I will introduce the RecSys Design Framework with a case study of Amazon.

RecSys Framework — Amazon case study

What have we learned

Reflections

Explore the rest of Modern Visual RecSys Series

Modern Visual RecSys: How does a recommender work? [Foundational]

In this series of articles, I will introduce modern approaches to visual recommender systems. We begin with a case…

Modern Visual RecSys: Intro to Visual RecSys [Core]

We will explore the “hello world” data for visual models, the FashionMNIST dataset from Zalando with PyTorch…

Modern Visual RecSys: Convolutional Neural Networks Recommender [Pro]

We will build a recommender by leveraging transfer learning with ResNet and return visually similar products across…

Modern Visual RecSys: COVID-19 Case Study with CNN [Pro]

We will cluster COVID-19 X-ray images based on severity with our CNN RecSys flow using transfer learning, Spotify’s…

Building a Personalized Real-Time Fashion Collection Recommender [Pro]

We will make use of transfer learning, approximate nearest neighbors, and embeddings centroid detection in PyTorch to…

Temporal Fashion Recommender [Pro]

Building a Recommender That Evolves with Seasons

The Future of Visual Recommender Systems: Four Practical State-Of-The-Art Techniques [Pro]

The future of visual RecSys is an exciting one. Let us explore some of the most cutting edge techniques and ideas that…

Further Readings

Written by Kai Xin Thia